References
 (1) Schneider T, et al. (2017) Climate goals and computing the future of clouds. Nature Climate Change 7(1):3–5.
 (2) Hourdin F, et al. (2017) The Art and Science of Climate Model Tuning. Bulletin of the American Meteorological Society 98(3):589–602.
 (3) Stevens B, Bony S, Ginoux P, Ming Y, Horowitz LW (2013) What Are Climate Models Missing? Science 340(6136):1053–1054.
 (4) Bony S, et al. (2015) Clouds, circulation and climate sensitivity. Nature Geoscience 8(4):261–268.
 (5) Oueslati B, Bellon G (2015) The double ITCZ bias in CMIP5 models: interaction between SST, largescale circulation and precipitation. Climate Dynamics 44(34):585–607.
 (6) Gentine P, et al. (2013) A Probabilistic Bulk Model of Coupled Mixed Layer and Convection. Part I: ClearSky Case. Journal of the Atmospheric Sciences 70(6):1543–1556.
 (7) Bony S, Dufresne J (2005) Marine boundary layer clouds at the heart of tropical cloud feedback uncertainties in climate models. Geophysical Research Letters 32(20):L20806.
 (8) Sherwood SC, Bony S, Dufresne JL (2014) Spread in model climate sensitivity traced to atmospheric convective mixing. Nature 505(7481):37–42.
 (9) Weisman ML, Skamarock WC, Klemp JB (1997) The Resolution Dependence of Explicitly Modeled Convective Systems. Monthly Weather Review 125(4):527–548.
 (10) Sun J, Pritchard MS (2016) Effects of explicit convection on global landatmosphere coupling in the superparameterized CAM. Journal of Advances in Modeling Earth Systems 8(3):1248–1269.
 (11) Leutwyler D, Lüthi D, Ban N, Fuhrer O, Schär C (2017) Evaluation of the convectionresolving climate modeling approach on continental scales. Journal of Geophysical Research: Atmospheres 122(10):5237–5258.
 (12) Muller C, Bony S (2015) What favors convective aggregation and why? Geophysical Research Letters 42(13):5626–5634.
 (13) Soden BJ, Vecchi GA (2011) The vertical distribution of cloud feedback in coupled oceanatmosphere models. Geophysical Research Letters 38(12):n/a–n/a.
 (14) Miyamoto Y, et al. (2013) Deep moist atmospheric convection in a subkilometer global simulation. Geophysical Research Letters 40(18):4922–4926.
 (15) Bretherton CS, Khairoutdinov MF (2015) Convective selfaggregation feedbacks in nearglobal cloudresolving simulations of an aquaplanet. Journal of Advances in Modeling Earth Systems 7(4):1765–1787.
 (16) Yashiro H, et al. (2016) Resolution Dependence of the Diurnal Cycle of Precipitation Simulated by a Global CloudSystem Resolving Model. SOLA 12(0):272–276.
 (17) Klocke D, Brueck M, Hohenegger C, Stevens B (2017) Rediscovery of the doldrums in stormresolving simulations over the tropical Atlantic. Nature Geoscience 10(12):891–896.
 (18) LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.
 (19) Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. (MIT Press).
 (20) Nielsen MA (2015) Neural Networks and Deep Learning. (Determination Press).
 (21) Krasnopolsky VM, FoxRabinovitz MS, Belochitski AA (2013) Using Ensemble of Neural Networks to Learn Stochastic Convection Parameterizations for Climate and Numerical Weather Prediction Models from Data Simulated by a Cloud Resolving Model. Advances in Artificial Neural Systems 2013:1–13.
 (22) Brenowitz ND, Bretherton CS (2018) Prognostic validation of a neural network unified physics parameterization.
 (23) Gentine P, Pritchard M, Rasp S, Reinaudi G, Yacalis G (2018) Could machine learning break the convection parameterization deadlock? Geophysical Research Letters.
 (24) Collins WD, et al. (2006) The Formulation and Atmospheric Simulation of the Community Atmosphere Model Version 3 (CAM3). Journal of Climate 19(11):2144–2161.
 (25) Andersen JA, Kuang Z (2012) Moist Static Energy Budget of MJOlike Disturbances in the Atmosphere of a Zonally Symmetric Aquaplanet. Journal of Climate 25(8):2782–2804.
 (26) Khairoutdinov MF, Randall DA (2001) A cloud resolving model as a cloud parameterization in the NCAR Community Climate System Model: Preliminary results. Geophysical Research Letters 28(18):3617–3620.
 (27) Pritchard MS, Bretherton CS, DeMott CA (2014) Restricting 32128 km horizontal scales hardly affects the MJO in the Superparameterized Community Atmosphere Model v.3.0 but the number of cloudresolving grid columns constrains vertical mixing. Journal of Advances in Modeling Earth Systems 6(3):723–739.
 (28) Benedict JJ, Randall DA (2009) Structure of the Madden–Julian Oscillation in the Superparameterized CAM. Journal of the Atmospheric Sciences 66(11):3277–3296.
 (29) Arnold NP, Randall DA (2015) Globalscale convective aggregation: Implications for the MaddenJulian Oscillation. Journal of Advances in Modeling Earth Systems 7(4):1499–1518.
 (30) Kooperman GJ, Pritchard MS, O’Brien TA, Timmermans BW (2018) Rainfall From Resolved Rather Than Parameterized Processes Better Represents the PresentDay and Climate Change Response of Moderate Rates in the Community Atmosphere Model. Journal of Advances in Modeling Earth Systems 10(4):971–988.
 (31) Wheeler M, Kiladis GN (1999) Convectively Coupled Equatorial Waves: Analysis of Clouds and Temperature in the Wavenumber–Frequency Domain. Journal of the Atmospheric Sciences 56(3):374–399.
 (32) O’Gorman PA, Dwyer JG (2018) Using machine learning to parameterize moist convection: potential for modeling of climate, climate change and extreme events.
 (33) Schneider T, Lan S, Stuart A, Teixeira J (2017) Earth System Modeling 2.0: A Blueprint for Models That Learn From Observations and Targeted HighResolution Simulations. Geophysical Research Letters 44(24):12,396–12,417.
 (34) Moncrieff MW, Liu C, Bogenschutz P (2017) Simulation, Modeling and Dynamically Based Parameterization of Organized Tropical Convection for Global Climate Models. Journal of the Atmospheric Sciences pp. JAS–D–16–0166.1.
 (35) Tulich SN (2015) A strategy for representing the effects of convective momentum transport in multiscale models: Evaluation using a new superparameterized version of the Weather Research and Forecast model (SPWRF). Journal of Advances in Modeling Earth Systems 7(2):938–962.
 (36) Woelfle MD, Yu S, Bretherton CS, Pritchard MS (2018) Sensitivity of Coupled Tropical Pacific Model Biases to Convective Parameterization in CESM1. Journal of Advances in Modeling Earth Systems 10(1):126–144.
 (37) Zhang G, McFarlane NA (1995) Sensitivity of climate simulations to the parameterization of cumulus convection in the Canadian climate centre general circulation model. AtmosphereOcean 33(3):407–446.

(38)
Chollet F, Others (2015) Keras.

(39)
Abadi M, et al. (2016) TensorFlow: A system for largescale machine learning in
12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). pp. 265–283.  (40) Kingma DP, Ba J (2014) Adam: A Method for Stochastic Optimization.
Supplement
Supplemental Methods
SPCAM Setup
The SPCAM model source code along with our modifications, including the neural network implementation, is available at https://gitlab.com/mspritch/spcam3.0neuralnet (branch: nn_fbp_engy_ess).
We use the Community Atmosphere Model 3.0 (24) with superparameterization (26) as our training and reference model. The model has an approximately twodegree horizontal resolution with 30 vertical levels and a 30 minute time step. The embedded twodimensional cloud resolving models consist of eight 4 kmwide columns oriented meriodinally, as in Ref. (27). The CRM time step is 20 seconds. Subgrid turbulence in the CRM is parameterized with a local 1.5order closure. Each GCM time step the CRM tendencies are applied to the resolved grid. Note that our SPCAM setup does not feed back momentum tendencies from the CRM to the global grid. While these might be important (34), our neural network also cannot capture momentum fluxes. Using global CRM data or augmented SP that includes 3D CRM domains with interactive momentum (or 2D SP equipped with a downgradient momentum parameterization after Ref. (35)) would prove beneficial for this purpose, especially towards oceancoupled simulations in which cumulus friction is known to be important to the equatorial cold tongue/ITCZ nexus (36). After the SP update, the radiation scheme is called which uses subgrid cloud information from the CRM. This is followed by a computation of the surface fluxes with a simple bulk scheme and the dynamical core. CTRLCAM uses the default parameterizations which includes the ZhangMcFarlane convection scheme (37) and a simple vertical turbulent diffusion scheme.
The physical parameterization of NNCAM is 20 times faster than SPCAM and 8 times faster than CTRLCAM. This results in a total model speedup of factor 10 compared to SPCAM and factor 4 compared to CTRLCAM. To generate the best possible training data for the neural network we run the radiation scheme every GCM time step for SPCAM and CTRLCAM. In CTRLCAM, therefore, the radiation scheme is much more computationally expensive than in the standard setup where the radiation scheme is only called every few GCM time steps.
The sea surface temperatures (SSTs) are prescribed in our aquaplanet setup that follows Ref. (25). The reference state is zonally symmetric with a maximum shifted five degrees to the North of the equator to avoid unstable behaviors observed for equatorially symmetric aquaplanet setups:
(1) 
where the SST is given in Celcius, is the latitude in degrees and
(2) 
Additionally, we run simulations with a globally increased SSTs up to 4K in increments of 1K and a zonally asymmetric run with a wavenumber one perturbation added to the reference SSTs:
(3) 
where is longitude in degrees. The sun is in perpetual equinox with a full diurnal cycle. All experiments were started with the same initial conditions and allowed to spin up for a year. The subsequent five years were used for analysis. Training data for the neural network was taken from the second year of the SPCAM simulations.
Neural network
All neural network code is available at https://github.com/raspstephan/CBRAINCAM
We use the Python library Keras (38) with the Tensorflow (39)
backend for all neural network experiments. Our neural network architecture consists of nine fullyconnected layers with 256 nodes each. This adds up to a total of 567,361 learnable parameters. The LeakyReLU activation function
resulted in the lowest training losses. The neural network was trained for 18 epochs with a batch size of 1024. The optimizer used was Adam
(40)with a mean squared error loss function. We started with a learning rate of
which was divided by five every three epochs. The total training time was on the order of 8 hours on a single Nvidia GTX 1080 graphics processing unit (GPU).The input variables for the neural network were chosen to mirror the information received by the CRM and radiation scheme but lack the condensed water species and the dynamical tendencies. The latter are applied as a constant forcing during the CRM integration. We found, however, that they did not improve the neural network performance and trimmed the input variables for the sake of simplicity. Another option would be to include the surface flux computation in the network as well. In this option the fluxes are removed from the input and the surface temperature is added. This option yielded similar results but did not allow us to investigate column energy conservation.
The input values are normalized by subtracting each element of the stacked input vector (Table S1) by its mean across samples and then dividing it by the maximum of its range and the standard deviation computed across all levels of the respective physical variable. This is done to avoid dividing by very small values, e.g. for humidity in the upper levels, which can cause the input values to become very large if the neural network predicts noisy tendencies. For the outputs, the heating and moistening rates are brought to the same order of magnitude by converting them to W kg . The radiative fluxes and precipitation were normalized to be on the same order of magnitude as the heating and moistening rates (see Table S1 for multiplication factors). The magnitude of the output values determines their importance in the loss function. In our quadratic loss function differences are highlighted even further. Making sure that no single value dominates the loss is important to get a consistent prediction quality. For a reasonable range (factor five) around our normalization values the results are largely unaffected, however.
Deep neural networks appear to be essential to achieve a stable and realistic prognostic implementation. Similar to other studies which used shallow neural networks (21, 22) we encountered unstable modes and unrealistic artifacts for networks with two or one hidden layers (Fig. S1). A four layer network was the minimal complexity to provide good results for our configuration. Adding further layers shows little correlation between training skill and prognostic performance. We chose our network design to lie well withing the range of stable network configurations.
Input variables  Unit  Output variables  Unit  Normalization  

Temperature  K  30  Heating rate  K s  30  
Humidity  kg kg  30  Moistening rate  kg kg s  30  
Meridional wind  m s  30  Shortwave flux at TOA  W m  1  
Surface pressure  Pa  1  Shortwave flux at surface  W m  1  
Incoming solar radiation  W m  1  Longwave flux at TOA  W m  1  
Sensible heat flux  W m  1  Longwave flux at surface  W m  1  
Latent heat flux  W m  1  Precipitation  kg m d  1  
Size of stacked vectors  94  65 
Comments
There are no comments yet.