Towards Physically-consistent, Data-driven Models of Convection

02/20/2020 ∙ by Tom Beucler, et al. ∙ 0

Data-driven algorithms, in particular neural networks, can emulate the effect of sub-grid scale processes in coarse-resolution climate models if trained on high-resolution climate simulations. However, they may violate key physical constraints and lack the ability to generalize outside of their training set. Here, we show that physical constraints can be enforced in neural networks, either approximately by adapting the loss function or to machine precision by adapting the architecture. As these physical constraints are insufficient to guarantee generalizability, we additionally propose a framework to find physical normalizations that can be applied to the training and validation data to improve the ability of neural networks to generalize to unseen climates.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Computational resources typically limit climate models to coarse spatial resolutions of order km that distort convection and clouds [Schneider2017]. This distortion causes well-known biases, including a lack of precipitation extremes and an erroneous tropical wave spectrum [Daleu2016], which plague climate predictions [IPCC2014]. In contrast, global cloud-resolving models can resolve scales of

, greatly reducing these problematic biases. However, their increased computational cost limits simulations to a few years. Machine-learning algorithms can be helpful in this context, when trained on high-resolution simulations to replace semi-empirical convective parametrizations in low-resolution models, hence bridging the

km scales at reduced computational cost [Gentine2018a, OGorman2018, Brenowitz2019]

. In particular, neural networks (NNs) are scalable and powerful non-linear regression tools that can successfully mimic

convective processes (e.g., [Brenowitz2018, Rasp2018]). However, NNs are typically physically-inconsistent by construction: (1) they may violate important physical constraints, such as energy conservation or the positive definition of precipitation, and (2) they make large errors when evaluated outside of their training set, e.g. produce unrealistically large convective heating in warmer climates. In this abstract, we ask:

How can we design physically-consistent NN models of convection?

We adapt the NN’s architecture of loss function to enforce conservation laws in Section 2 and improve the NN’s ability to generalize in Section 3

by physically normalizing the data to transform extrapolation into interpolation without losing information.

In both sections, our “truth” is two years of simulations using the Super-Parameterized Community Atmosphere Model version 3.0 [Khairoutdinov2005] to simulate the climate for two years in aquaplanet configuration [Pritchard2014] with a realistic decrease in surface temperature from the Equator to the poles. Snapshots of the interaction between climate- vs. convection-permitting scales are saved every 30 minutes, which allows us to work in a data-rich limit [Gentine2018a] by drawing 42M samples from the first year for training and 42M samples from the second for validation.

2 Enforcing Conservation Laws in Neural Networks

In this section, our goal is to conserve mass, energy and radiation in a NN parametrization of convection. The parametrization’s goal is to map the local climate’s state to the rate at which convection redistributes heat and all three phases of water, along with radiation and precipitation. In practice, we map a vector

of length 304 to a vector of length 218:


where groups local thermodynamic variables:


groups subgrid-scale thermodynamic tendencies:


and all variables are defined in Table 1 of the Supplemental Information (SI). In this case, conservation laws can be written as linear constraints acting on both input and output vectors: , where we define the constraints matrix of shape in Equation (12) of [Beucler2019b].

We train a hierarchy of three NNs, all with a baseline architecture of 5 layers with 512 neurons with optimized hyperparameters informed by a modern formal search using the SHERPA library


. All NNs are trained for 15 epochs using the RMSProp optimizer

[tieleman2012lecture], which prevents over-fitting while guaranteeing reasonable performance. First, we train a baseline unconstrained NN to map to (UCnet). Second, we enforce conservation laws by introducing a penalty in the loss function for violating conservation laws (similar to [Karpatne2017, Jia2019]):


where is the penalty weight, wherein the penalty is given by the mean squared-residual from the conservation laws:


and the mean-squared error is the mean squared-difference between the NN’s prediction and the truth :


The loss-constrained networks are referred to as . Finally, we enforce conservation laws by changing the NN’s architecture (see Figure 1) so as to conserve mass, energy and radiation to machine precision [Beucler2019a].

Figure 1: Architecture of ACnet

This NN, referred to as ACnet, calculates direct outputs using a standard NN while the remaining outputs are calculated as residuals from the fixed constraints layers, upstream of the optimizer. We summarize the and of UCnet, ACnet, and of various weights in Figure 2.

Figure 2: Mean-squared error (black) and squared-residual (blue) from conservation laws for UCnet, ACnet and for . Note that .

We note a clear trade-off between performance (measured by ) and physical constraints (measured by the ) as the weight given to conservation laws increases from 0 to 1. An intermediate value is desirable (e.g., or ) as UCnet () violates conservation laws more than the multiple-linear regression baseline (horizontal blue line) and performs worse than the multiple-linear regression baseline (horizontal black line). In contrast, ACnet eliminates the need to compromise between performance and physical constraints by enforcing conservation laws to machine precision, which is required in climate models, while achieving skill that is competitive with UCnet. Having solved the conservation issue, in the next section we turn to a deeper problem with NNs that, despite being physically-constrained, ACnets and LCnets both fail to generalize to unseen climates (Figure 3).

Figure 3: Daily-averaged prediction from UCnet, , and ACnet in the Tropics for the reference climate (top) and the (+4K) experiment (bottom)

3 Improving Generalization to Unseen Climates

Testing generalization ability requires a generalization test. For that purpose, we run the same model configuration after applying a uniform 4K warming to the surface temperature, which we will refer to as the (+4K) experiment. We then test the NNs trained on the reference climate in out-of-sample conditions, i.e. the deep Tropics of (+4K) as illustrated in Figure 5 of the SI. As can be seen in Figure 3, NNs make extremely large errors when evaluated outside of their training set, such as overestimating convective moistening by a factor of 5.

Motivated by the success of non-dimensionalization to improve the generalizability of models in fluid mechanics, we seek to rephrase the boxed part of our convective parametrization () using non-dimensional numbers that improve its generalizability. Unlike idealized problems in fluid mechanics, moist thermodynamics involve multiple non-linear processes, including phase changes and non-local interactions, that prevent reducing our mapping to a few dimensionless numbers, e.g. via the Buckingham-Pi theorem. Instead, we develop a three-step method that consists of (1) non-dimensionalizing the input and output training datasets to then (2) train NNs on these new datasets to finally (3) compare their generalization abilities to our baseline UCnet. We use a different architecture of 7 layers with 128 neurons each for that lower-dimensional mapping, again informed by formal hyperparameter tuning, and train the NNs for 15 epochs using the Adam optimizer [Kingma2014] while saving the state of best validation loss to avoid over-fitting.

Figure 4: Daily-averaged predicted convective moistening (left) and heating (right) from UCnet (blue), a NN with scaled inputs (orange line) and a NN with scaled inputs and outputs (green) in the Tropics for the (+4K) experiment

We make progress via trial-and-error of multiple NNs and present two successful normalizations below and in Figure 4: one for the inputs and one for the outputs.

Both successful normalizations leverage the Clausius-Clapeyron equation, which implies that the saturation specific humidity scales exponentially with absolute temperature , making the extrapolation problem especially challenging. The first normalization (inputs) is to use relative humidity instead of specific humidity (orange vertical lines): . This exploits the fact that unlike specific humidity, relative humidity is expected to change relatively little as the climate warms [Romps2014]. The second normalization (outputs) is to normalize the vertical redistribution of energy by convection (in ) using surface enthalpy fluxes conditionally averaged on temperature (green line, see Equations 7 and 8 in the SI). Although both physically-motivated normalizations significantly improve the ability of the NN to generalize to unseen conditions, errors linked to the upwards shift of convection with warming are still visible in Figure 4. This motivates physically-rescaling the vertical coordinate, which we leave for future work.

Note that training a NN on both the reference climate and the (+4K) experiment would be the simplest way of achieving generalizability. However, the exercise of physically-normalizing the data to make our NN generalize to unseen climates allows us to leverage deep learning for scientific discovery: By identifying the relevant dimensionless numbers, we make progress towards a climate-invariant mapping from the local climate to convective tendencies, which would be a daunting task using traditional statistical methods.

4 Conclusion

We made progress towards physically-consistent, neural-network parametrizations of convection in two ways. In Section 2, we enforced physical constraints in NNs (1) approximately by using the loss function and (2) to machine precision by modifying the network’s architecture. In Section 3, we helped neural-networks generalize to unseen conditions by leveraging the Clausius-Clapeyron equation to physically normalize both inputs and outputs of the parametrization. While our work initially stemmed from operational requirements to improve convective processes in climate models, the generalization exercise of section 3 also offers a pathway towards data-driven scientific discovery of the interaction between convection and the large-scale climate, e.g. to adapt the entrainment-detrainment paradigm to diverse convective regimes.


Supplemental Information

Variable Name
Latent heat flux
Large-scale forcings in
water, temperature, velocity
Longwave heating rate profile
Net surface longwave flux
Net top-of-atmosphere longwave flux
Total precipitation rate
Solid precipitation rate
Surface pressure
Solar insolation
Sensible heat flux
Shortwave heating rate profile
Net surface shortwave flux
Net top-of-atmosphere shortwave flux
Absolute temperature profile
Convective heating profile
Heating from turbulent
kinetic energy dissipation
Ice concentration profile
Convective ice tendency profile
Liquid water concentration profile
Convective liquid water tendency profile
Specific humidity profile
Convective water vapor tendency profile
North-South velocity profile
Table 1: Definition of Variables: Variables that depend on height are (boldfaced) vectors, referred to as “profiles”.
Figure 5: Surface temperature versus latitude for the reference and (+4K) experiments

Physically-normalizing the NN outputs

Motivated by the success of normalizing input data to take into account the sharp increase of atmospheric water vapor concentration with temperature, we develop an analogous normalization for the output data. As convection typically redistributes energy from the bottom to the top of the atmosphere, we can interpret convective heating and moistening profiles as processes partitioning surface enthalpy fluxes (in ) between different layers of the atmosphere. As such, we mass-weight both profiles before normalizing them using surface enthalpy fluxes conditionally-averaged on near-surface temperature (referred to as ). Mathematically, this physical normalization can be written as:


where are the layers’ pressure thicknesses, is the gravity constant, is the specific heat capacity of dry air at constant pressure, and is the latent heat of vaporization of water in standard atmospheric conditions.