Predicting atmospheric optical properties for radiative transfer computations using neural networks

The radiative transfer equations are well-known, but radiation parametrizations in atmospheric models are computationally expensive. A promising tool for accelerating parametrizations is the use of machine learning techniques. In this study, we develop a machine learning-based parametrization for the gaseous optical properties by training neural networks to emulate the Rapid Radiative Transfer model for General circulation Model applications - Parallel (RRTMGP). To minimize computational costs, we reduce the range of atmospheric conditions for which the neural networks are applicable and use machine-specific optimised BLAS functions to accelerate matrix computations. To generate training data, we use a set of randomly perturbed atmospheric profiles and calculate optical properties using RRTMGP. Predicted optical properties are highly accurate and the resulting radiative fluxes have errors within 1.2 W m^-2 for longwave and 0.12 W m^-2 for shortwave radiation. Our parametrization is 3 to 7 times faster than RRTMGP, depending on the size of the neural networks. We further test the trade-off between speed and accuracy by training neural networks for a single LES case, so smaller and therefore faster networks can achieve a desired accuracy, especially for shortwave radiation. We conclude that our machine learning-based parametrization can speed-up radiative transfer computations whilst retaining high accuracy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

06/02/2020

Light-in-the-loop: using a photonics co-processor for scalable training of neural networks

As neural networks grow larger and more complex and data-hungry, trainin...
12/26/2019

Development of Use-specific High Performance Cyber-Nanomaterial Optical Detectors by Effective Choice of Machine Learning Algorithms

Due to their inherent variabilities, nanomaterial-based sensors are chal...
07/20/2017

Machine Learning for Quantum Dynamics: Deep Learning of Excitation Energy Transfer Properties

Understanding the relationship between the structure of light-harvesting...
03/12/2022

Generic Lithography Modeling with Dual-band Optics-Inspired Neural Networks

Lithography simulation is a critical step in VLSI design and optimizatio...
04/11/2022

Machine learning based event classification for the energy-differential measurement of the ^natC(n,p) and ^natC(n,d) reactions

The paper explores the feasibility of using machine learning techniques,...
05/07/2021

FVM Network to Reduce Computational Cost of CFD Simulation

Despite the rapid growth of CPU performance, the computational cost to s...
03/15/2018

Estimation of lactate threshold with machine learning techniques in recreational runners

Lactate threshold is considered an essential parameter when assessing pe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Accurate calculations of radiative fluxes are key to capturing the coupling between radiation, the atmosphere, and the surface. Unlike many parametrizations of subgrid processes, the radiative transfer equations are well-known and the accuracy of radiative transfer computations therefore depends mainly on the available input. However, fully computing the radiative fluxes requires solving the radiative transfer equation for all spectral lines and other absorption features in the solar and thermal spectrum. Direct integration across the spectrum requires enormous numbers of computations to resolve each line; this integration is often parameterized with correlated- distribution methods [1, 2, 3] to drastically reduce the number of quadrature points. Even with this approximation radiative transfer schemes in weather and climate models remain a large computational burden. An important part of radiative transfer parametrizations is therefore to find approaches or further approximations that further reduce the computational costs, for example by coarsening the spatial and temporal resolution of the radiative transfer computations [4, 5] or random sampling in spectral space [6].

A promising and increasingly explored approach to accelerate or improve parametrizations is the use of machine-learning techniques [8]. The application of machine learning to accelerate expensive radiative transfer computations is the first use of machine learning in the atmospheric sciences [9] and several studies have already applied machine learning to predict vertical profiles of longwave [9, 10, 11, 12, 13, 14] and shortwave [11, 12, 14] radiative fluxes in weather and climate models. This end-to-end approach, i.e. predicting radiative fluxes by fully emulating a radiative transfer scheme, may result in speed-ups of more than one order of magnitude [9, 10, 12]. However, such an approach is not very flexible with respect to changes in the vertical grid as the neural networks are trained for a fixed number of vertical layers [9, 10, 11, 12]. Moreover, this approach does not fully respect the well-understood underlying physics as it also involves replacing the two-stream [15] equations that converge to the well-known radiative transfer equations.

In this study we present a machine-learning approach for accelerating radiative transfer computations that respects the radiative transfer equation while emulating only the most ad hoc parts. Radiative transfer solvers require information on the optical properties of the atmosphere to compute upwelling and downwelling radiative fluxes. These atmospheric optical properties determine how much radiation is emitted (Plank source function), absorbed or scattered (optical depth, single scattering albedo), and the direction of scattering (asymmetry parameter). Our aim is to emulate the gaseous optical properties calculations of the Rapid Radiative Transfer Model for General circulation model applications - Parallel (RRTMGP) [16]. We use neural networks as a computationally efficient tool to replace the lookup-tables used in RRTMGP and train the networks for a narrowed range of atmospheric conditions, e.g. by neglecting variations of many trace gases and limiting the range of temperatures and pressures. Constraining the range and number of inputs allows further optimisation of the neural networks, which helps to reduce the computational costs of our parametrization compared to RRTMGP. This approximation can be particularly suitable for limited-area models, such as large-eddy simulations (LES), in which the range of values of thermodynamic variables is smaller than in numerical weather predictions or climate simulations and concentrations of many trace gases can often be assumed in time and space.

2 Training data generation

We train three sets of artificial neural networks to predict all optical properties:

  • One set (NWP) that is trained for a wide range of atmospheric conditions, roughly representing the variability expected in numerical weather prediction, but with all gases except water vapour and ozone kept constant;

  • Two sets (Cabauw, RCEMIP

    ) that are trained for only one LES case each in order to estimate the performance gains of this LES tuning.

Due to the relatively small variability of atmospheric conditions within one LES case, the LES-tuned networks may contain substantially fewer layers and nodes than the NWP-tuned networks and can therefore be faster. This may provide a great speed-up, when the LES case is used for a large number of numerical experiments.

To generate the data required to train and validate the neural networks we use the new radiation package RTE+RRTMGP [16]. For each grid cell in the domain, RRTMGP calculates all optical properties from temperature (), pressure (), and the concentrations of a wide range of gases. Given these optical properties, RTE (Radiative Transfer for Energetics) calculates the radiative fluxes throughout each column.

RRTMGP covers the spectral range of radiation relevant to atmospheric problems using a correlated k-distribution [3] with 14 shortwave (-) bands, 16 longwave (-) bands, and 16 g-points per band. We therefore need to predict 224 () values for the short wave optical properties and 256 () for the longwave optical depth. For the Planck source function we also need to predict the upward and downward emission at each layer interface, resulting in 768 () values.

The NWP-tuned neural networks are trained with only , , H2O and O3 as input, which are time-varying 3D variables in global weather prediction models. To create training data, we use the set of 100 atmospheric profiles of temperature, pressure and several gas concentrations from the Radiative Forcing Model Intercomparison Project (RFMIP) [17]

, but with all gas concentrations except H2O and O3 set constant to reduce the degrees of freedom. These 100 profiles were chosen such that the weighted sum of the radiative fluxes through the profiles represents the global mean radiative flux

[17] but do not represent the full diversity of atmospheric conditions on earth. Therefore, we generate extra data spanning a wider range of atmospheric conditions, by generating permutations of this set of 100 profiles with random perturbations in , , H2O and O3:

where , , and are random numbers between and and is the index of each layer. Since the four random numbers are generated independently, the perturbations are uncorrelated. This gives a larger variation of different atmospheric conditions in the training data and reduces the risk of overfitting but may result in combinations of , , H2O and O3 that are unlikely to occur in reality. To reduce the risk of generating unrealistic data, we recompute H2O with another random number whenever H2O is larger than the saturation water vapour mixing ratio. The surface temperature is randomly chosen between and , where is the temperature at the lowest pressure level.

For the two LES-tuned sets of neural networks, we compute O3 as a monotonic function of pressure following [18] with a lower boundary of . Consequently, we train these networks using only , and H2O. For the first LES-tuned set (Cabauw), we run a 10-hour LES simulation (07 UTC to 17 UTC) of a developing convective boundary layer over grassland near the Cabauw Experimental Site for Atmospheric Research (CESAR) in the Netherlands with shallow cumulus clouds forming in the afternoon (see [19] and [20] for a detailed description of the case). The simulation is performed using the Dutch Atmospheric Large-Eddy Simulation (DALES) [21], with domain size of and a resolution of . For the second LES-tuned set (RCEMIP), we run a 100-day simulation with MicroHH [22] following the specification for cloud resolving models (see RCE_small300 in [18]) of the Radiative Convective Equilibrium Model Intercomparison Project (RCEMIP) [18]. This is a case with deep convection over a tropical ocean with an atmosphere in radiative convective equilibrium, meaning that radiative cooling is balanced by convective heating [18]. The simulation is performed with a domain size of , a horizontal resolution of and 72 vertical levels. For each LES-tuned set, we then determine the minimum and maximum H2O and values of each vertical layer and subsequently generate 1000 random profiles of , H2O and to cover the full parameter space of the corresponding simulation. To deal with negative or unrealistically low water vapour concentrations in the simulations, we set a lower H2O limit in these profiles of and for the Cabauw and RCEMIP simulation, respectively.

For every combination of , , O3 and H2O (NWP) or , , and H2O (Cabauw, RCEMIP), we then calculate the optical properties at each -point using RRTMGP. To improve convergence of the neural network, we log-scale the optical depths, Planck source function, H2O, (O3,) and during both training and inference using a fast approximation of the natural logarithm to keep the computational effort feasible:

(1)

where we take . This approximation follows from the fast approximation of the exponential we use during inference (Eq. 3

). Subsequently, we normalize all variables to a zero mean and unit variance, with different means and standard deviations for the upper (

) and lower () atmosphere per variable. A random 90% of the dataset is used to train the neural networks and 5% of the data is used for validation. The remaining 5% was reserved to test the trained networks, but is no longer used as we use a newly generated set of profiles for the tests presented in this study.

3 Artificial neural networks

3.1 Network architecture and training

The neural networks are designed in and trained with TensorFlow

[23], version 1.11/1.12. We need to predict 4 different optical properties: the single scattering albedo (shortwave only), the shortwave and longwave optical depth, and the Planck source function . For each optical property we train two neural networks, for the upper atmosphere () and lower () atmosphere, because this distinction is also made in RRTMGP. By training seperate neural networks for each optical property we can reduce the complexity of each network. Furthermore, this allows us to (re-)train the networks for the different optical properties independently. The neural networks are trained to predict the optical properties of each grid cell or layer from the values of , , (O3), and H2O of only that grid cell or layer. As such, each network has either 4 (NWP-tuned set) or 3 (LES-tuned sets) inputs and either 224 (, ), 256 () or 768 () outputs.

All networks are feed-forward multi-layer perceptrons with densely-connected layers. We use a leaky ReLu activation function

[24] with a slope of 0.2 in all hidden layers and a linear activation function in the output layer. We train for

steps or 658 epochs, where one epoch is one iteration over the entire training dataset. During training, we use the Adam optimiser

[25]

to optimise the weights, with a batch size of 128 and an initial learning rate of 0.01 that decays every 10 epochs. As the loss function, we use the mean squared error (MSE):

(2)

where and are the batch size and number of -points, respectively. and are the optical properties predicted by the neural networks and calculated by RRTMGP, respectively.

Although several studies have already shown successful attempts to optimise neural network architecture for accuracy with machine learning [26, 27], choosing the numbers of hidden layers and the number of nodes per layer is often still a matter of manual tuning. Wider and deeper network are able to learn more complex functions, with the risk of overfitting, but are slower during both training and inference. We test several network sizes (Table 1

) to investigate the trade off between accuracy and performance for different network sizes. This includes a network without hidden layers (Linear) that performs only linear regression as a reference.

Name Layer 1 Layer 2 Layer 3
Linear - - -
1L32 32 - -
1L64 64 - -
2L32_32 32 32 -
2L64_64 64 64 -
3L32_64_128 32 64 128
Table 1: Properties of hidden layers and number of nodes for tested neural networks

3.2 Implementation in radiative transfer solver

We use the trained neural networks as a parametrization for the gas optics in the RTE+RRTMGP framework. We replace the RRTMGP gas optics and source function routine and pair the new parametrization with the RTE radiative transfer solver. The new parametrization gets as input one or more columns of , , H2O and O3 (latter only for NWP) and outputs, for each layer of each column, the log-scaled and normalised optical properties for all 224, 256 or 768 -points. The workflow for a neural network with two hidden layers of 64 nodes, is as follows:

  1. Initialise the weights matrices , ,

    and bias vectors

    , , of the trained neural networks, where is the number of inputs (3 or 4), the number of outputs (224, 256 or 768), and and the number of nodes of the first and second hidden layer (both 64).

  2. Create input matrix from the 4 input columns, where is the batch size, i.e. the number of grid cells computed simultaneously.

  3. Apply log-scaling on , H2O and O3 (NWP only) and normalise all input variables

  4. Calculate the first hidden layer ():

    • Calculate matrix product

    • Add to each column of

    • Apply Leaky ReLu activation function

  5. Calculate the second hidden layer ():

    • Calculate matrix product

    • Add to each column of

    • Apply Leaky ReLu activation function

  6. Calculate output matrix ():

    • Calculate matrix product

    • Add to each column of

  7. Denormalise output matrix and take the exponential of all values.

The matrix products are the computationally most expensive parts of the neural network-solver. We therefore make use of the Level 3 functions of the BLAS library for which machine-specific optimised versions exist. In our implementation, we use the C-interface to the sgemm function of Intel’s Math Kernel Library (MKL). For the exponentiation, we use a fast approximation in line with the natural logarithm approximation (eq. 1)

(3)

where we take . The logarithm and exponential in step iii and vii are omitted for the single scattering albedo

4 Results & Discussion

4.1 NWP-tuned networks

4.1.1 Prediction skill

Figure 1: Mean squared errors, as function of the number of epochs, of the longwave optical depth (a), Planck source function (b), shortwave optical depth (c) and single scattering albedo (d) for the six different network sizes (Table 1). The mean squared error shown here are based on the full evaluation dataset.

Since our loss function is the mean squared error (MSE) of the predicted optical properties with respect to the optical properties calculated by RRTMGP, a useful indication of the accuracy of the neural networks is the evolution of the MSE as training progresses (Fig. 1). For each of the four optical properties, the MSE of the linear networks remains approximately constant throughout the training. It must be noted here that Figure 1 only shows the MSE of the evaluations, which is done every 50.000 training steps (6.6 epochs) using the full evaluation dataset, meaning that the linear networks have mostly converged before the first evaluation step. However, the MSE of the linear networks is over an order of magnitude larger than the MSE of the other networks, which suggests that a linear approach is not suitable here.

Comparing the neural networks with one or more hidden layers, we generally see that as the size of the networks increases, the number of epochs needed to reach convergence increases and the MSE at the end of the training decreases. The relatively large difference between the MSE’s of the 2L-64_64 and 3L-32_64_128 networks suggests that we may still be able to strongly reduce the MSE by increasing the network size. This was not done, however, because the higher computational costs of larger networks will also limit the speed-up that can be achieved. Interestingly, the 1L_64 networks perform slightly better than the 2L_32_32 networks despite having the same number of hidden nodes: due to the larger number of output nodes (224/256), the 1L_64 networks have more connections between nodes. This means that more complex functions can be learned, but may also lead to higher computational costs.

To give a more intuitive measure of the predictive skill of the network, we generate a new set of 100 randomly perturbed profiles and calculate R-squared values () between the optical properties computed by the neural networks and by RRTMGP. The -values are determined for each -point separately and subsequently averaged over all 224 or 256 -points () to represent the overall performance . The networks with one or more hidden layers typically have for the optical depths and the Planck source function and for the single scattering albedo. These high correlation coefficients give us confidence that the neural networks are able to predict the optical properties with very high accuracy. Averaged over the four optical properties, the Linear networks only have and are thus clearly less accurate than the other networks. Although this also seems very high, this accuracy is already insufficient obtain accurate radiative fluxes (Section 4.1.2).

4.1.2 Towards radiative fluxes

Figure 2: For all network sizes, vertical profiles of the errors of the radiative fluxes and heating rates based on the neural network-predicted optical properties with respect to the radiative fluxes based on the optical properties from RRTMGP. Shown are the mean (solid) and twice the standard deviation (dashed) of the error, for the downwelling (a) and upwelling (b) longwave radiation, longwave heating rates (c), the downwelling (d) and upwelling (e) shortwave radiation and shortwave heating rates (f).

Although the neural networks have high predictive skill, for atmospheric modelling applications we are mainly interested in the radiative fluxes through the atmosphere and at the surface. The errors of the radiative fluxes that follow from the predicted optical properties therefore help to assess whether the achieved accuracy of the neural networks is sufficient. To this end, we use the implementation of the neural networks in RTE+RRTMGP to calculate radiative fluxes based on the optical properties predicted by the neural networks.

The optical properties of the linear networks result in flux errors over (not shown), which is significantly larger than the flux errors of the other networks and not acceptable for application in NWP or LES. For this reason, the linear networks will not be included in further analyses. For the other network sizes (Table 1), the errors generally decrease as the complexity of the neural networks increases (Fig. 2). Interestingly, the mean flux errors differ only slightly, whereas the spread of the flux errors is clearly smaller for the more complex networks. This indicates that reducing the size of the neural networks used to predict optical properties does not introduce a significant bias in the radiative fluxes, but results in larger flux errors for individual atmospheric profiles because smaller networks predict less accurate optical properties. Nonetheless, the flux errors are more than two orders of magnitude smaller than the mean radiative fluxes: at the surface, the average downwelling and upwelling longwave fluxes are approximately and , respectively, and the average downwelling and upwelling shortwave fluxes are approximately and , respectively. Errors of net radiative fluxes computed by radiative transfer parameterizations in climate models can be up to for longwave and for shortwave radiation at the surface and the top of the atmosphere [28], depending on atmospheric conditions. These errors are larger than the range of our radiative flux errors with respect to RRTMGP, illustrating the accuracy of our neural network based parametrization.

Additionally, our errors in the downwelling surface fluxes and upwelling top of atmosphere fluxes with respect to RRTMGP are on average similar to or smaller than the average errors of RRTMGP with respect to the highly-accurate, but computationally very expensive, line-by-line radiative transfer model (LBLRTM) [29]. Since the neural networks are trained against RRTMGP, this suggest that further increasing the predictive skill of the networks will not improve the radiative fluxes much, as the flux errors compared to the "ground truth" will then be dominated by the errors of RRTMGP with respect to LBLRTM. The errors of radiative heating rates, which are proportional to the divergence of the radiative fluxes, are within about in the shortwave for all network sizes, except at the top of the atmosphere. The longwave radiative heating rates have errors within for most of the profile, although the smaller networks give errors up to at the surface. However, given the strong the strong dependency of the near-surface longwave heating rates on the surface temperature, we may expect that heating rates errors dampen out quickly due to adjustments of the surface temperature.

Figure 3: For both longwave (a) and shortwave radiation (b) and for all network sizes, box plots of the mean absolute error per spectral band of the downwelling surface fluxes with respect the RRTMGP. The radiative flux per spectral band is the integral of the radiative fluxes over the 16 -points of each band. For each network size, the minimum error in (b) is on the order of and corresponds to the spectral band with the shortest wavelength, which is almost completely absorbed in the stratosphere.

Although in atmospheric models we are mainly interested in the total, spectrally-integrated longwave and shortwave fluxes, radiative fluxes per spectral band may be of interest for accurate predictions of the UV-index [30] or photosynthetically activate radiation (PAR). Therefore, we also determine the radiative fluxes for each spectral band and subsequently the mean absolute errors per band of the downwelling surface fluxes with respect to RRTMGP. The maximum error of the flux per band ranges between and in the longwave spectrum and between and in the shortwave spectrum (Figure 3), depending on the network sizes. Using the neural network-predicted optical properties, we can thus calculate the radiative fluxes of each spectral band with errors below . Moreover, the minimum errors of the radiative flux per band are even several orders of magnitude smaller than the maximum.

4.1.3 Computational performance

Figure 4: For all network sizes, the speed-up of the neural networks-solver against the mean absolute errors of the radiative heating rates (a), the upwelling top-of-atmosphere flux (b) and the downwelling surface flux (c), for shortwave (blue) and longwave (red) radiation.

Besides the accuracy of the predicted optical properties and the resulting radiative fluxes, the computational costs of the neural networks are of great interest since our main goal is to accelerate radiation computations. To evaluate the difference in runtime between RRTMGP and the neural network-solver, we generate 11 sets of 100 profiles with 1020 vertical layers from the RFMIP profiles. The optical properties of these sets are evaluated sequentially and the runtimes of the last 10 sets are averaged. The runtime of the first set is neglected to allow some spin-up time, mainly for the initialization of BLAS.

The neural network-solver is approximately 3 to 7 times faster than RRTMGP, depending on size of the networks (Fig. 4). The speed-up of the neural network-solver generally decreases for increasing network complexity. This is as expected, because the number of matrix multiplications scales with the number of layers and because the size of the matrices, and thus the computational effort required for the matrix multiplications, scales with the number of nodes per layer. The choice of a network size thus depends on the acceptable accuracy for a particular LES application.

Although our neural network-solver reduces the computational costs of the optical properties calculations significantly, end-to-end machine learning approaches that predict radiative fluxes immediately [9, 10, 11, 12] have achieved speed-ups up to 80 times for the full radiative schemes. However, these end-to-end approaches also replace the governing radiative transfer equations and will give little flexibilty to changes in grid resolution. With our approach we keep the radiative transfer equations intact. However, this means that we still have to perform the spectral integration by predicting optical properties and calculating fluxes for all g-points.

4.2 LES-tuned networks

The predicted optical properties of LES-tuned neural networks (Cabauw, RCEMIP), which are trained only for the range of atmospheric conditions expected in one LES case, are also very accurate. To further test the LES-tuned neural networks, we sample 1000 random atmospheric profiles each from the Cabauw and RCEMIP simulations and calculate profiles of optical properties with the Cabauw and RCEMIP sets of neural networks, respectively. For the Cabauw networks, we generally observe an improvement in the accuracy of the radiative heating rates, the downwelling surface fluxes and the upwelling top-of-atmosphere fluxes (Figure 5) due to the LES tuning. This improvement is especially large in the shortwave spectrum, where for some networks sizes the mean squared errors of the surface and top-of-atmosphere fluxes are over an order of magnitude lower with the Cabauw networks than with the NWP networks. The lower mean squared errors of the Cabauw neural networks show that with LES tuning, we can use relatively small networks (e.g. 1L-32) to achieve similar or even higher accuracy than the more complex networks (2L-64_64, 3L-32_64_128) of the NWP set, which results in a larger speed-up compared to RRTMGP (Figure 3).

For the RCEMIP neural networks, the accuracy improvement in the shortwave spectrum we achieve with LES tuning (Figure 6) is similar to the improvement achieved by Cabauw networks. In the longwave spectrum, the accuracy of the radiative heating rates is higher with the RCEMIP networks than with the NWP networks. However, for some network sizes the accuracy of the radiative surface and top-of-atmosphere fluxes is lower with the RCEMIP networks. This suggests that not all RCEMIP networks have learned to predict optical properties for all combinations of , and H2O in the RCEMIP simulation as good as the NWP networks.

The mean absolute errors of the NWP networks on the profiles of the Cabauw (Figure 5) and RCEMIP (Figure 6) simulations are frequently larger than the errors of the NWP networks on the RFMIP-based profiles (Figure 2). On one hand, we may attribute the higher mean absolute errors on differences in the mean radiative fluxes, assuming the relative error remains approximately the same. However, the lower errors of NWP networks on the RFMIP-based profiles may also be a sign of overfitting due to insufficiently independent training and testing data. Nevertheless, given that the mean absolute errors are well within we are still confident that the NWP neural networks can be accurately used on a relatively wide range of atmospheric conditions.

Figure 5: For all network sizes of the NWP (blue) and Cabauw (red) sets of neural networks, the mean absolute errors with respect to RRTMGP of the radiative heating rates (a,d), upwelling radiative fluxes at the top of atmosphere (b,e) and downwelling radiative fluxes at the surface (c,f) for the longwave (a,b,c) and shortwave (d,e,f) spectrum. Radiative fluxes and heating rates are based on 1000 random profiles of Cabauw simulation
Figure 6: For all network sizes of the NWP (blue) and RCEMIP (red) sets of neural networks, the mean absolute errors with respect to RRTMGP of the radiative heating rates (a,d), upwelling radiative fluxes at the top of atmosphere (b,e) and downwelling radiative fluxes at the surface (c,f) for the longwave (a,b,c) and shortwave (d,e,f) spectrum. Radiative fluxes and heating rates are based on 1000 random profiles of RCEMIP simulation

5 Conclusions

We developed a new parametrization for the gas optics by training multiple neural networks to emulate the gaseous optical properties calculations of RRTMGP [16]. The neural networks are able to predict the optical properties with high accuracy and errors of the radiative fluxes based on the predicted optical properties are generally within . The resulting radiative heating rates are also accurate, especially in the shortwave spectrum. Radiative heating rate errors are up to in the longwave spectrum, mainly near the surface, but we expect these errors to decrease rapidly after adjustment of the surface and air temperatures.

The neural networks tested in this study are approximately 3 to 7 times faster than RRTMGP, depending on network size. The larger networks achieve lower speed-ups than the small networks, but result in more accurate radiative fluxes and heating rates, clearly showing a trade-off between accuracy and computational speed. To further investigate this trade-off, we trained two additional sets of neural networks that are tuned for a single LES case each (Cabauw, RCEMIP) In general, these LES-tuned networks are more accurate on profiles of their respective simulations than the NWP networks, especially for shortwave radiation. This indicates that with LES tuning, smaller and therefore faster neural networks suffice to achieve a desired accuracy.

Given that RRTMGP uses linear interpolation from look-up tables to to compute optical properties

[16], the computational efficiency of our neural network-based parametrization may be surprising. We attribute the speed-ups achieved by our parametrization to a large extent to the case-specific tuning, i.e. considering only a few gases or greatly limiting the range of atmospheric conditions (Cabauw and RCEMIP only), which reduces the problem size for which the neural networks have to be trained. Furthermore, the matrix computations required to solve the neural networks allow the use of machine-specific optimised BLAS libraries and reduces the memory use at the expense of floating point operations.

The speed-ups we achieve are less than those achieved by end-to-end approaches that emulate full radiative transfer parametrizations [9, 10, 11, 12]. An advantage of our machine learning approach is that it still respects the governing radiative transfer equations. A promising future approach would be the application of machine learning to optimise the spectral integration. With such a machine learning approach the radiative transfer equations will still be solved, while the number of quadrature points may be reduced.

MV carried out the experiments and analyses. CvH and RP developed the ideas that led to the study. MV and CvH designed the study. RS provided feedback on data generation and network design. CvL and DP provided expert knowledge on optimising network design and hardware-specific tuning. RP provided expert knowledge on radiative transfer. All authors read and approved the manuscript.

The author(s) declare that they have no competing interests.

This study was funded by the SURF Open Innovation Lab, project no. SOIL.DL4HPC.03 and the Dutch Research Council (NWO), project no. VI.Vidi.192.068.

The authors thank Axel Berg and the SURF Open Innovation Lab for introducing us into the world of machine learning, Peter Ukkonen for an interesting exchange of ideas, and Jordi Vilà-Guerau de Arellano for valuable discussions. RP is grateful to the conference organisers for the invitation to speak and motivation to think through the problem.

References

  • [1] Goody R, West R, Chen L, Crisp D. 1989 The correlated-k method for radiation calculations in nonhomogeneous atmospheres. Journal of Quantitative Spectroscopy and Radiative Transfer 42, 539–550. (doi:10.1016/0022-4073(89)90044-7)
  • [2] Lacis AA, Oinas V. 1991 A description of the correlated k distribution method for modeling nongray gaseous absorption, thermal emission, and multiple scattering in vertically inhomogeneous atmospheres. Journal of Geophysical Research: Atmospheres 96, 9027–9063. (doi:10.1029/90JD01945)
  • [3] Fu Q, Liou KN. 1992 On the correlated k-distribution method for radiative transfer in nonhomogeneous atmospheres. Journal of the Atmospheric Sciences 49, 2139–2156.
  • [4] Morcrette, J. 2000 On the effects of temporal and spatial sampling of radiation fields on the ECMWF forecasts and analyses. Monthly Weather Review 128, 876–887 (doi:10.1175/1520-0493(2000)128<0876:OTEOTT>2.0.CO;2 )
  • [5] Hogan RJ, Bozzo A. 2018 A flexible and efficient radiation scheme for the ecmwf model. Journal of Advances in Modeling Earth Systems 10, 1990–2008. (doi:10.1029/2018MS001364)
  • [6] Pincus R, Stevens B. 2009 Monte carlo spectral integration: a consistent approximation for radiative transfer in large eddy simulations. Journal of Advances in Modeling Earth Systems 1. (doi:10.3894/JAMES.2009.1.1)
  • [7] Pincus R, Barker HW, Morcrette JJ. 2003 A fast, flexible, approximate technique for computing radiative transfer in inhomogeneous cloud fields. Journal of Geophysical Research: Atmospheres 108. (doi:10.1029/2002JD003322)
  • [8]

    Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J, Carvalhais N, Prabhat 2019 Deep learning and process understanding for data-driven earth system science.

    Nature 566, 195–204.
  • [9] Chevallier F, Chéruy F, Scott NA, Chédin A. 1998 A neural network approach for a fast and accurate computation of a longwave radiative budget. Journal of Applied Meteorology 37, 1385–1397. (doi:10.1175/1520-0450(1998)037<1385:ANNAFA>2.0.CO;2)
  • [10] Krasnopolsky VM, Fox-Rabinovitz MS, Chalikov DV. 2005 New approach to calculation of atmospheric model physics: Accurate and fast neural network emulation of longwave radiation in a climate model. Monthly Weather Review 133, 1370–1383. (doi:10.1175/MWR2923.1)
  • [11] Krasnopolsky VM, Fox-Rabinovitz MS. 2006 Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Networks 19, 122–134. (doi:10.1016/j.neunet.2006.01.002)
  • [12] Krasnopolsky VM, Fox-Rabinovitz MS, Hou YT, Lord SJ, Belochitski AA. 2010 Accurate and fast neural network emulations of model radiation for the ncep coupled climate forecast system: Climate simulations and seasonal predictions. Monthly Weather Review 138, 1822–1842. (doi:10.1175/2009MWR3149.1)
  • [13] Belochitski A, Binev P, DeVore R, Fox-Rabinovitz M, Krasnopolsky V, Lamby P. 2011 Tree approximation of the long wave radiation parameterization in the ncar cam global climate model. Journal of Computational and Applied Mathematics 236, 447–460. (doi:10.1016/j.cam.2011.07.013)
  • [14] Pal A, Mahajan S, Norman MR. 2019 Using deep neural networks as cost-effective surrogate models for super-parameterized E3SM radiative transfer. Geophyysical Research Letters 46, 6069-6079. (doi:doi.org/10.1029/2018GL081646)
  • [15] Meador WE, Weaver WR. 1980 Two-stream approximations to radiative transfer in planetary atmospheres: A unified description of existing methods and a new improvement. Journal of the Atmospheric Sciences 37, 630–643. (doi:10.1175/1520-0469(1980)037<0630:TSATRT>2.0.CO;2)
  • [16] Pincus R, Mlawer EJ, Delamere JS. 2019 Balancing accuracy, efficiency, and flexibility in radiation calculations for dynamical models. Journal of Advances in Modeling Earth Systems 11, 3074–3089. (doi:10.1029/2019MS001621)
  • [17] Pincus R, Forster PM, Stevens B. 2016 The radiative forcing model intercomparison project (rfmip): Experimental protocol for cmip6. Geoscientific Model Development 9. (doi:10.5194/gmd-9-3447-2016)
  • [18] Wing AA, Reed KA, Satoh M, Stevens B, Bony S, Ohno T. 2018 Radiative–convective equilibrium model intercomparison project. Geoscientific Model Development 11, 793–813. (doi:10.5194/gmd-11-793-2018)
  • [19] Vilà-Guerau de Arellano J, Ouwersloot HG, Baldocchi D, Jacobs CMJ. 2014 Shallow cumulus rooted in photosynthesis. Geophysical Research Letters 41, 1796–1802. (doi:10.1002/2014GL059279)
  • [20] Pedruzo-Bagazgoitia X, Ouwersloot HG, Sikma M, van Heerwaarden CC, Jacobs CMJ, Vilà-Guerau de Arellano J. 2017 Direct and diffuse radiation in the shallow-cumulus vegetation system: Enhanced and decreased evapotranspiration regimes. Journal of Hydrometeorology 18, 1731–1748. (doi:10.1175/JHM-D-16-0279.1)
  • [21] Heus T, van Heerwaarden CC, Jonker HJJ, Pier Siebesma A, Axelsen S, van den Dries K, Geoffroy O, Moene AF, Pino D, de Roode SR, Vilà-Guerau de Arellano J. 2010 Formulation of the dutch atmospheric large-eddy simulation (dales) and overview of its applications. Geoscientific Model Development 3, 415–444. (doi:10.5194/gmd-3-415-2010)
  • [22] van Heerwaarden C, Van Stratum BJ, Heus T, Gibbs JA, Fedorovich E, Mellado JP. 2017 Microhh 1.0: a computational fluid dynamics code for direct numerical simulation and large-eddy simulation of atmospheric boundary layer flows. Geoscientific Model Development 10, 3145–3165.
  • [23] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow IJ, Harp A, Irving G, Isard M, Jia Y, Józefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray DG, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker PA, Vanhoucke V, Vasudevan V, Viégas FB, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X. 2016 Tensorflow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467.
  • [24] Maas AL, Hannun AY, Ng AY. 2013 Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, p. 3.
  • [25] Kingma DP, Ba J. 2014 Adam: A Method for Stochastic Optimization. arXiv e-prints p. arXiv:1412.6980.
  • [26] Baker B, Gupta O, Naik N, Raskar R. 2016 Designing neural network architectures using reinforcement learning. CoRR abs/1611.02167.
  • [27] Zoph B, Le QV. 2016 Neural architecture search with reinforcement learning. CoRR abs/1611.01578.
  • [28] Pincus R, Mlawer EJ, Oreopoulos L, Ackerman AS, Baek S, Brath M, Buehler SA, Cady-Pereira KE, Cole JNS, Dufresne JL, Kelley M, Li J, Manners J, Paynter DJ, Roehrig R, Sekiguchi M, Schwarzkopf DM. 2015 Radiative flux and forcing parameterization error in aerosol-free clear skies. Geophysical Research Letters 42, 5485–5492. (doi:10.1002/2015GL064291)
  • [29] Clough S, Shephard M, Mlawer E, Delamere J, Iacono M, Cady-Pereira K, Boukabara S, Brown P. 2005 Atmospheric radiative transfer modeling: a summary of the aer codes. Journal of Quantitative Spectroscopy and Radiative Transfer 91, 233–244. (doi:10.1016/j.jqsrt.2004.05.058)
  • [30] WHO. 2002 Global solar UV index: a practical guide. Geneva, Switzerland: World Health Organization.