Royal Belgian Institute for Space AeronomyPhysics and chemistry of the atmosphere of the Earth and other planets, and of outer space.

Machine learning to derive surface concentrations of nitrogen dioxide

2021-2022

Near-surface nitrogen dioxide (NO2) is of great concern due to its impact on air quality and human health. Machine learning (ML) is an innovative approach to establish a nonlinear mapping between surface NO2 distributions and geo-physical predictors at high resolution and accuracy. However, it remains challenging to apply ML to produce surface NO2 operational products with realistic spatial patterns and uncertainty quantification. We are exploring a systematical scheme for a stable ML-based surface NO2 product provision.

Inferring near-surface NO₂ concentrations from atmospheric columns observed by satellites is essential for assessing air quality and health risks. This requires the construction of a model using satellite and ground observations and other ancillary data sets. Although this work is primarily carried out using physical models (assimilating observations) or empirical statistical models, these methods have to make trade-offs between computational efficiency, resolution, and accuracy.

This problem is mitigated by machine learning (ML), which has a superior ability to construct complex non-linear mappings from drivers to targets. Meanwhile, ML has been widely studied in various disciplines with the rapid development of computing power and big data.

Currently, ML has been demonstrated in many studies for its ability to estimate the spatiotemporal distribution of surface NO₂ at high resolution. However, it remains challenging to use ML to produce surface NO₂ products due to unstable prediction, lack of uncertainty assessment, and weak physical constraints.

For stable ML-based surface NO₂ product production

Our study aims to address these challenges and explore a systematic and practical scheme for stable ML-based surface NO₂ product production, in the framework of the Terrascope project. This work is ongoing and the research scheme is outlined below:

Identify influential predictors and explore the appropriate data processing method.
Investigate the behavior and performance of different tree-based and neural network-based ML models. Develop the ML algorithm for surface NO₂ estimation by designing the structure and loss function.
Develop uncertainty quantification methods for ML models and provide the prediction interval for the models.
Examine the reliability of model results and proceed with model interpretation.
Conduct health impact assessment based on model prediction.
Publish the ML-generated surface NO₂ product for public access.
Test algorithms and schemes on the Belgian domain and extend the study to other European countries.

Overall, this study aims to explore how ML models can improve the prediction of surface NO₂ and provide corresponding products, which would provide a perspective for practical applications of ML methods in atmospheric science. Furthermore, we expect that the methodology in this work could be further exploited for the prediction of other atmospheric components.

Mapping of surface NO2 distributions for Belgium using machine learning with Tropomi observations and other geophysical predictors.

Workflow mapping surface NO2 machine learning

The workflow for mapping surface NO2 distributions using machine learning methods. This process includes data preparation, model training and testing, uncertainty quantification, model interpretation, mapping of surface NO2 and providing corresponding prediction intervals.

Royal Belgian Institute for Space Aeronomy

Search form

Machine learning to derive surface concentrations of nitrogen dioxide

Body text

For stable ML-based surface NO2 product production

Figure 2 body text

Figure 2 caption (legend)

For stable ML-based surface NO₂ product production