Achieving Conservation of Energy in Neural Network Emulators for Climate Modeling

06/15/2019 ∙ by Tom Beucler, et al. ∙ 0

Artificial neural-networks have the potential to emulate cloud processes with higher accuracy than the semi-empirical emulators currently used in climate models. However, neural-network models do not intrinsically conserve energy and mass, which is an obstacle to using them for long-term climate predictions. Here, we propose two methods to enforce linear conservation laws in neural-network emulators of physical models: Constraining (1) the loss function or (2) the architecture of the network itself. Applied to the emulation of explicitly-resolved cloud processes in a prototype multi-scale climate model, we show that architecture constraints can enforce conservation laws to satisfactory numerical precision, while all constraints help the neural-network better generalize to conditions outside of its training set, such as global warming.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Motivation

The largest source of uncertainty in climate projections is the response of clouds to warming [12]. The turbulent eddies generating clouds are typically only -wide, meaning that climate models need to be run at spatial resolutions as fine as to prevent large biases. Unfortunately, computational resources currently limit climate models to spatial resolutions of when run for time periods relevant to societal decisions, e.g. 100 years [7]. Therefore, climate models rely on semi-empirical models of cloud processes, referred to as convective parametrizations [14, 13]. If designed by hand, convective parametrizations are unable to capture the complexity of cloud processes and cause well-known biases, including a lack of extreme precipitation events and unrealistic cloud structures [4, 5].

Recent advances in statistical learning offer the possibility of designing data-driven convective parametrizations by training algorithms on short-period but high-resolution climate simulations [6]. The first attempts have successfully modeled the interaction between small-scale clouds and the large-scale climate, offering a pathway to improve the accuracy of climate predictions [2, 11, 9]

. However, machine learning-based climate models do not intrinsically conserve energy and mass, which is a major obstacle to their adoption by the physical science community for several reasons, e.g.:

1) Realistic simulations of climate change respond to relatively small radiative forcing from carbon dioxide. Inconsistencies of this magnitude can prevent this small forcing from being communicated down to the surface and the ocean where most of the biomass lives.

2) Artificial sources and sinks of mass and energy distort weather and cloud formation on short timescales, resulting in large temperature and humidity drifts or biases for the long-term climate.

Current machine-learning convective parametrizations that conserve energy are based on decision trees (e.g. random forests), but these are too slow for practical use in climate models

[10]. Since neural-network convective parametrizations can significantly reduce cloud biases in climate models while decreasing their overall computational cost [11], we ask: How can we enforce conservation laws in neural-network emulators of physical models?

After proposing two methods to enforce physical constraints in neural network models of physical systems in Section 2, we apply them to emulate cloud processes in a climate model in Section 3, before comparing their performances and how they improve climate predictions in Section 4.

2 Theory

Consider a physical system represented by a function that maps an input to an output :

(1)

Many physical systems satisfy exact physical constraints, such as the conservation of energy or momentum. In this paper, we assume that these physical constraints can be written as an under-determined linear system of rank :

(2)

where is a constraints matrix acting on the input and output of the system. The physical system has constraints, and by construction, . Our goal is to build a computationally-efficient emulator of the physical system and its physical constraints

. For the sake of simplicity, we build this emulator using a feed-forward neural network

trained on preexisting measurements of and , as shown in Figure 1.

Figure 1: Standard feed-forward configuration

We measure the quality of using the mean-squared error, defined as:

(3)

where is the neural network’s output and the “truth”. Our reference case, referred to as “unconstrained neural network” (NNU), optimizes using as its loss function. To enforce the physical constraints in our neural network, we consider two options:

  1. Constraining the loss function : In this setting, we penalize our neural network for violating physical constraints using a penalty , defined as the residual from the physical constraints:

    (4)

    We apply this penalty by giving it a weight in the loss function , which is similar to a Lagrange multiplier:

    (5)
  2. Constraining the architecture : In this setting, we augment the simple network with conservation layers to enforce the conservation laws to numerical precision (Figure 2), while still calculating the loss over the entire

    output vector

    . The feed-forward network outputs an “unconstrained” vector whose size is only , where is the number of constraints. We then calculate the remaining component of the output vector using the constraints. This defines constraint layers that ensure that the final output exactly respects the physical constraints . A possible construction of solves the system of equations

    from the bottom to the top row after writing it in row-echelon form. Note that the loss is propagated through the physical constraints.

    Figure 2: Architecture-constrained configuration

3 Application to Convective Parametrization for Climate Modeling

Validation Metric Linear (MLR) Uncons. (NNU) Loss () Loss () Architecture (NNA)
Baseline
(+0K)
Cl.change
(+4K)
Table 1: Mean-Squared Error (skill) and Physical Constraints Penalty (violation of energy/mass/radiation conservation laws) for different neural networks in units using the format .

Figure 3: scores of different neural networks simulating the outgoing longwave radiation field over the entire planet for the (+0K) dataset (first row) and (+4K) dataset (second row).

We now implement the three neural networks and compare their performances in the particular case of convective parametrization via emulation of the 8,192 cloud-resolving sub-domains embedded in the Super-Parametrized Community Atmosphere Model 3.0 [3, 8]. We simulate an “ocean world” where the surface temperatures are fixed with a realistic equator-to-pole gradient [1]

. To facilitate the comparison, all networks have 5 hidden layers with 512 nodes each, and use leaky rectangular unit activation functions:

to help capture the system’s non-linearity. We use the RMSprop optimizer

[15]

to train each network during 20 epochs, using 3 months of climate simulation with 30-minute outputs as training data.

The goal of the neural network is to predict an output vector of size 218 that represents the effect of cloud processes on climate (i.e. convective and radiative tendencies), based on an input vector of size 304 that represents the climate state (i.e. large-scale thermodynamic variables). The 4 conservation laws can be written as a sparse matrix of size that acts on and to yield equation 2.

Each row of the conservation matrix describes a different conservation law: The first row is the conservation of enthalpy, the second row is the conservation of mass, the third row is the conservation of terrestrial radiation and the last row is the conservation of solar radiation. In the architecture-constrained case, we output an unconstrained vector of size , and calculate the 4 remaining components of the output vector by solving the system of equations from bottom to top.

We evaluate the performances of on two different validation datasets:

(+0K) An “ocean world” similar to the training dataset.

(+4K) An “ocean world” where the surface temperature has been uniformly warmed by 4K, a proxy for the effects of climate change. We do not expect (NN) to perform well in the Tropics, where this perturbation leads to temperatures outside of the training set.

4 Results

Table 1 compares the performance and the degree to which each neural network violates conservation laws, as measured by the mean-squared error and the penalty , respectively.

All neural networks perform better than the multiple-linear regression model (MLR), derived by replacing leaky rectangular units with the identity function and optimized independently. While the reference “unconstrained” network NNU performs well as measured by

, it does so by breaking conservation laws, resulting in a large penalty . Enforcing conservation laws via architecture constraints (NNA) works to satisfactory numerical precision on both validation datasets, resulting in a very small penalty . Giving equal weight to and in the loss functionleads to mediocre performances in all areas. In contrast, surprisingly, introducing the penalty in the loss function with a very small weight leads to the best performance on the reference validation dataset . Both constrained networks and generalize better to unforeseen conditions than the "unconstrained" network, suggesting that physically constraining neural networks improves their representation abilities. This ability to generalize is confirmed by the high score when predicting the outgoing longwave radiation (Figure 3), which can be used as a direct measure of radiative forcing in climate change scenarios.

Overall, our results suggest that (1) constraining the network’s architecture is a powerful way to ensure energy conservation over a wide range of climates and (2) introducing a very small information about physical constraints in the loss function or/and the network’s architecture can significantly improve the generalization abilities of our neural network emulators.

References