Deep learning to represent sub-grid processes in climate models

06/12/2018 ∙ by Stephan Rasp, et al. ∙ Universität München 6

The representation of nonlinear sub-grid processes, especially clouds, has been a major source of uncertainty in climate models for decades. Cloud-resolving models better represent many of these processes and can now be run globally but only for short-term simulations of at most a few years because of computational limitations. Here we demonstrate that deep learning can be used to capture many advantages of cloud-resolving modeling at a fraction of the computational cost. We train a deep neural network to represent all atmospheric sub-grid processes in a climate model by learning from a multi-scale model in which convection is treated explicitly. The trained neural network then replaces the traditional sub-grid parameterizations in a global general circulation model in which it freely interacts with the resolved dynamics and the surface-flux scheme. The prognostic multi-year simulations are stable and closely reproduce not only the mean climate of the cloud-resolving simulation but also key aspects of variability, including precipitation extremes and the equatorial wave spectrum. Furthermore, the neural network approximately conserves energy despite not being explicitly instructed to. Finally, we show that the neural network parameterization generalizes to new surface forcing patterns but struggles to cope with temperatures far outside its training manifold. Our results show the feasibility of using deep learning for climate model parameterization. In a broader context, we anticipate that data-driven Earth System Model development could play a key role in reducing climate prediction uncertainty in the coming decade.



There are no comments yet.


page 4

page 5

page 12

page 13

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • (1) Schneider T, et al. (2017) Climate goals and computing the future of clouds. Nature Climate Change 7(1):3–5.
  • (2) Hourdin F, et al. (2017) The Art and Science of Climate Model Tuning. Bulletin of the American Meteorological Society 98(3):589–602.
  • (3) Stevens B, Bony S, Ginoux P, Ming Y, Horowitz LW (2013) What Are Climate Models Missing? Science 340(6136):1053–1054.
  • (4) Bony S, et al. (2015) Clouds, circulation and climate sensitivity. Nature Geoscience 8(4):261–268.
  • (5) Oueslati B, Bellon G (2015) The double ITCZ bias in CMIP5 models: interaction between SST, large-scale circulation and precipitation. Climate Dynamics 44(3-4):585–607.
  • (6) Gentine P, et al. (2013) A Probabilistic Bulk Model of Coupled Mixed Layer and Convection. Part I: Clear-Sky Case. Journal of the Atmospheric Sciences 70(6):1543–1556.
  • (7) Bony S, Dufresne J (2005) Marine boundary layer clouds at the heart of tropical cloud feedback uncertainties in climate models. Geophysical Research Letters 32(20):L20806.
  • (8) Sherwood SC, Bony S, Dufresne JL (2014) Spread in model climate sensitivity traced to atmospheric convective mixing. Nature 505(7481):37–42.
  • (9) Weisman ML, Skamarock WC, Klemp JB (1997) The Resolution Dependence of Explicitly Modeled Convective Systems. Monthly Weather Review 125(4):527–548.
  • (10) Sun J, Pritchard MS (2016) Effects of explicit convection on global land-atmosphere coupling in the superparameterized CAM. Journal of Advances in Modeling Earth Systems 8(3):1248–1269.
  • (11) Leutwyler D, Lüthi D, Ban N, Fuhrer O, Schär C (2017) Evaluation of the convection-resolving climate modeling approach on continental scales. Journal of Geophysical Research: Atmospheres 122(10):5237–5258.
  • (12) Muller C, Bony S (2015) What favors convective aggregation and why? Geophysical Research Letters 42(13):5626–5634.
  • (13) Soden BJ, Vecchi GA (2011) The vertical distribution of cloud feedback in coupled ocean-atmosphere models. Geophysical Research Letters 38(12):n/a–n/a.
  • (14) Miyamoto Y, et al. (2013) Deep moist atmospheric convection in a subkilometer global simulation. Geophysical Research Letters 40(18):4922–4926.
  • (15) Bretherton CS, Khairoutdinov MF (2015) Convective self-aggregation feedbacks in near-global cloud-resolving simulations of an aquaplanet. Journal of Advances in Modeling Earth Systems 7(4):1765–1787.
  • (16) Yashiro H, et al. (2016) Resolution Dependence of the Diurnal Cycle of Precipitation Simulated by a Global Cloud-System Resolving Model. SOLA 12(0):272–276.
  • (17) Klocke D, Brueck M, Hohenegger C, Stevens B (2017) Rediscovery of the doldrums in storm-resolving simulations over the tropical Atlantic. Nature Geoscience 10(12):891–896.
  • (18) LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.
  • (19) Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. (MIT Press).
  • (20) Nielsen MA (2015) Neural Networks and Deep Learning. (Determination Press).
  • (21) Krasnopolsky VM, Fox-Rabinovitz MS, Belochitski AA (2013) Using Ensemble of Neural Networks to Learn Stochastic Convection Parameterizations for Climate and Numerical Weather Prediction Models from Data Simulated by a Cloud Resolving Model. Advances in Artificial Neural Systems 2013:1–13.
  • (22) Brenowitz ND, Bretherton CS (2018) Prognostic validation of a neural network unified physics parameterization.
  • (23) Gentine P, Pritchard M, Rasp S, Reinaudi G, Yacalis G (2018) Could machine learning break the convection parameterization deadlock? Geophysical Research Letters.
  • (24) Collins WD, et al. (2006) The Formulation and Atmospheric Simulation of the Community Atmosphere Model Version 3 (CAM3). Journal of Climate 19(11):2144–2161.
  • (25) Andersen JA, Kuang Z (2012) Moist Static Energy Budget of MJO-like Disturbances in the Atmosphere of a Zonally Symmetric Aquaplanet. Journal of Climate 25(8):2782–2804.
  • (26) Khairoutdinov MF, Randall DA (2001) A cloud resolving model as a cloud parameterization in the NCAR Community Climate System Model: Preliminary results. Geophysical Research Letters 28(18):3617–3620.
  • (27) Pritchard MS, Bretherton CS, DeMott CA (2014) Restricting 32-128 km horizontal scales hardly affects the MJO in the Superparameterized Community Atmosphere Model v.3.0 but the number of cloud-resolving grid columns constrains vertical mixing. Journal of Advances in Modeling Earth Systems 6(3):723–739.
  • (28) Benedict JJ, Randall DA (2009) Structure of the Madden–Julian Oscillation in the Superparameterized CAM. Journal of the Atmospheric Sciences 66(11):3277–3296.
  • (29) Arnold NP, Randall DA (2015) Global-scale convective aggregation: Implications for the Madden-Julian Oscillation. Journal of Advances in Modeling Earth Systems 7(4):1499–1518.
  • (30) Kooperman GJ, Pritchard MS, O’Brien TA, Timmermans BW (2018) Rainfall From Resolved Rather Than Parameterized Processes Better Represents the Present-Day and Climate Change Response of Moderate Rates in the Community Atmosphere Model. Journal of Advances in Modeling Earth Systems 10(4):971–988.
  • (31) Wheeler M, Kiladis GN (1999) Convectively Coupled Equatorial Waves: Analysis of Clouds and Temperature in the Wavenumber–Frequency Domain. Journal of the Atmospheric Sciences 56(3):374–399.
  • (32) O’Gorman PA, Dwyer JG (2018) Using machine learning to parameterize moist convection: potential for modeling of climate, climate change and extreme events.
  • (33) Schneider T, Lan S, Stuart A, Teixeira J (2017) Earth System Modeling 2.0: A Blueprint for Models That Learn From Observations and Targeted High-Resolution Simulations. Geophysical Research Letters 44(24):12,396–12,417.
  • (34) Moncrieff MW, Liu C, Bogenschutz P (2017) Simulation, Modeling and Dynamically Based Parameterization of Organized Tropical Convection for Global Climate Models. Journal of the Atmospheric Sciences pp. JAS–D–16–0166.1.
  • (35) Tulich SN (2015) A strategy for representing the effects of convective momentum transport in multiscale models: Evaluation using a new superparameterized version of the Weather Research and Forecast model (SP-WRF). Journal of Advances in Modeling Earth Systems 7(2):938–962.
  • (36) Woelfle MD, Yu S, Bretherton CS, Pritchard MS (2018) Sensitivity of Coupled Tropical Pacific Model Biases to Convective Parameterization in CESM1. Journal of Advances in Modeling Earth Systems 10(1):126–144.
  • (37) Zhang G, McFarlane NA (1995) Sensitivity of climate simulations to the parameterization of cumulus convection in the Canadian climate centre general circulation model. Atmosphere-Ocean 33(3):407–446.
  • (38)

    Chollet F, Others (2015) Keras.

  • (39)

    Abadi M, et al. (2016) TensorFlow: A system for large-scale machine learning in

    12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).
    pp. 265–283.
  • (40) Kingma DP, Ba J (2014) Adam: A Method for Stochastic Optimization.


Supplemental Methods


The SPCAM model source code along with our modifications, including the neural network implementation, is available at (branch: nn_fbp_engy_ess).

We use the Community Atmosphere Model 3.0 (24) with super-parameterization (26) as our training and reference model. The model has an approximately two-degree horizontal resolution with 30 vertical levels and a 30 minute time step. The embedded two-dimensional cloud resolving models consist of eight 4 km-wide columns oriented meriodinally, as in Ref. (27). The CRM time step is 20 seconds. Sub-grid turbulence in the CRM is parameterized with a local 1.5-order closure. Each GCM time step the CRM tendencies are applied to the resolved grid. Note that our SPCAM setup does not feed back momentum tendencies from the CRM to the global grid. While these might be important (34), our neural network also cannot capture momentum fluxes. Using global CRM data or augmented SP that includes 3D CRM domains with interactive momentum (or 2D SP equipped with a downgradient momentum parameterization after Ref. (35)) would prove beneficial for this purpose, especially towards ocean-coupled simulations in which cumulus friction is known to be important to the equatorial cold tongue/ITCZ nexus (36). After the SP update, the radiation scheme is called which uses sub-grid cloud information from the CRM. This is followed by a computation of the surface fluxes with a simple bulk scheme and the dynamical core. CTRLCAM uses the default parameterizations which includes the Zhang-McFarlane convection scheme (37) and a simple vertical turbulent diffusion scheme.

The physical parameterization of NNCAM is 20 times faster than SPCAM and 8 times faster than CTRLCAM. This results in a total model speed-up of factor 10 compared to SPCAM and factor 4 compared to CTRLCAM. To generate the best possible training data for the neural network we run the radiation scheme every GCM time step for SPCAM and CTRLCAM. In CTRLCAM, therefore, the radiation scheme is much more computationally expensive than in the standard setup where the radiation scheme is only called every few GCM time steps.

The sea surface temperatures (SSTs) are prescribed in our aquaplanet setup that follows Ref. (25). The reference state is zonally symmetric with a maximum shifted five degrees to the North of the equator to avoid unstable behaviors observed for equatorially symmetric aquaplanet setups:


where the SST is given in Celcius, is the latitude in degrees and


Additionally, we run simulations with a globally increased SSTs up to 4K in increments of 1K and a zonally asymmetric run with a wavenumber one perturbation added to the reference SSTs:


where is longitude in degrees. The sun is in perpetual equinox with a full diurnal cycle. All experiments were started with the same initial conditions and allowed to spin up for a year. The subsequent five years were used for analysis. Training data for the neural network was taken from the second year of the SPCAM simulations.

Neural network

All neural network code is available at

We use the Python library Keras (38) with the Tensorflow (39)

backend for all neural network experiments. Our neural network architecture consists of nine fully-connected layers with 256 nodes each. This adds up to a total of 567,361 learnable parameters. The LeakyReLU activation function

resulted in the lowest training losses. The neural network was trained for 18 epochs with a batch size of 1024. The optimizer used was Adam


with a mean squared error loss function. We started with a learning rate of

which was divided by five every three epochs. The total training time was on the order of 8 hours on a single Nvidia GTX 1080 graphics processing unit (GPU).

The input variables for the neural network were chosen to mirror the information received by the CRM and radiation scheme but lack the condensed water species and the dynamical tendencies. The latter are applied as a constant forcing during the CRM integration. We found, however, that they did not improve the neural network performance and trimmed the input variables for the sake of simplicity. Another option would be to include the surface flux computation in the network as well. In this option the fluxes are removed from the input and the surface temperature is added. This option yielded similar results but did not allow us to investigate column energy conservation.

The input values are normalized by subtracting each element of the stacked input vector (Table S1) by its mean across samples and then dividing it by the maximum of its range and the standard deviation computed across all levels of the respective physical variable. This is done to avoid dividing by very small values, e.g. for humidity in the upper levels, which can cause the input values to become very large if the neural network predicts noisy tendencies. For the outputs, the heating and moistening rates are brought to the same order of magnitude by converting them to W kg . The radiative fluxes and precipitation were normalized to be on the same order of magnitude as the heating and moistening rates (see Table S1 for multiplication factors). The magnitude of the output values determines their importance in the loss function. In our quadratic loss function differences are highlighted even further. Making sure that no single value dominates the loss is important to get a consistent prediction quality. For a reasonable range (factor five) around our normalization values the results are largely unaffected, however.

Deep neural networks appear to be essential to achieve a stable and realistic prognostic implementation. Similar to other studies which used shallow neural networks (21, 22) we encountered unstable modes and unrealistic artifacts for networks with two or one hidden layers (Fig. S1). A four layer network was the minimal complexity to provide good results for our configuration. Adding further layers shows little correlation between training skill and prognostic performance. We chose our network design to lie well withing the range of stable network configurations.

Input variables Unit Output variables Unit Normalization
Temperature K 30 Heating rate K s 30
Humidity kg kg 30 Moistening rate kg kg s 30
Meridional wind m s 30 Shortwave flux at TOA W m 1
Surface pressure Pa 1 Shortwave flux at surface W m 1
Incoming solar radiation W m 1 Longwave flux at TOA W m 1
Sensible heat flux W m 1 Longwave flux at surface W m 1
Latent heat flux W m 1 Precipitation kg m d 1
Size of stacked vectors 94 65
Table S1: Table showing input and output variables and their number of vertical levels . For the output variables the normalization factors are also listed. is the specific heat of air. is the latent heat of vaporization.
Figure S1: All figures show longitudinal and five year-temporal averages as in Fig. 1. Zonally and temporally averaged temperature relative to SPCAM for different network configurations (Number of hidden layers x Nodes per hidden layer). 8x512 corresponds to the network in Ref. (23).
Figure S2: (A) Mean convective sub-grid moistening rates . (B) Mean specific humidity and (C) zonal wind of SPCAM and biases of NNCAM and CTRLCAM relative to SPCAM. (D) Mean shortwave (solar) and longwave (thermal) net fluxes at the surface. The latitude axis is area-weighted.
Figure S3: (A) Zonally averaged temporal standard deviation of the convective sub-grid moistening rate . (B, C) Snapshots of heating and moistening rate . Note that these are taken from the free model simulations and should, therefore, not correspond one-to-one between the experiments.
Figure S4: Mass-weighted temperature integrated over the troposphere from hPa to hPa for SPCAM reference and differences of NNCAM and CTRLCAM with respect to reference for zonally perturbed simulations.
Figure S5: Zonally and temporally averaged (A, B) heating rate and (C, D) temperature relative to SPCAM. Panels A and C show reference SSTs while panels B and D show global 4 K perturbation. Temperature panels show SPCAM reference and differences to reference for several experiments described in the text.
Figure S6: (A) Space-time spectrum of the equatorially symmetric component of 15S-15N daily precipitation anomalies. As in Fig. 1b of Ref. (31). (B) Space-time spectrum of the equatorially symmetric component of 15S-15N daily precipitation anomalies divided by background spectrum. As in Fig. 3b of Ref. (31). Figure shows +4K SST minus reference SST. Negative (positive) values denote westward (eastward) traveling waves.