Martin G. Schultz, Sabine Schröder, Bing Gong, Felix Kleinert, Lukas Leufen, Clara Betancourt, Amirpasha Mozaffari, Najmeh Kaffashzadeh, Jenia Jitsev
Deep learning applications for weather and environmental applications are rapidly evolving. They present several challenges, which go beyond typical machine learning problems of e.g. image or speech recognition. Data are often scarce, in spite of massive data streams from current observational and modelling systems, time series are auto-correlated, and the “signal” is often dominated by well-understood cyclical patterns, such as seasonal and diurnal cycles. Therefore, data selection, i.e. separation of train, val, and test data, and validation methods must be carefully chosen based on accepted standards in the meteorological and air quality communities. To successfully analyze air quality data, the spatio-temporal variability needs to be captured, which requires processing of large datasets from numerical models and fusion of the data with observational time series. In the European IntelliAQ project, we explore modern data service and HPC concepts together with deep learning methods to improve the state assessment and forecasts of regional air quality. During the first year of the project, we set-up high performance parallel workflows for the data processing and deep learning workflows, and we tried out two major deep learning methods for time series forecasting (GoogLeNet) and for forecasting meteorological fields using a video frame prediction method (GAN+LSTM). The time series forecasting allows for the first time to train one network for many stations and apply the network to previously unseen stations with good results. The GAN method shows some promising results, but its forecasting capability is still far from the quality of numerical weather prediction systems.