I Motivation
Boxed in by computational limits, many of the details of our atmosphere remain too minute to explicitly resolve in climate models [1, 2, 3]. Key physics driving convection and cloud formation occur on the scale of meters to a few kilometers, while typical modern climate models have a resolution of horizontally  meaning important subgrid processes are parameterized. Computational capabilities are advancing, and climate models are increasingly common, in particular those with threedimensional explicit resolution of clouds systems. However, the capability to run these models for the 100year timescales needed is often impractical [4, 5, 6] and the information content they generate about the details of cloud and storm organization are frequently overwhelming to analyze at its native scale. This has left significant gaps in knowledge about many of the details of cloudclimate feedbacks and the relationship between storm organization and its thermodynamic environment [1, 6]
. However, deep learning, and in particular generative models, may provide a path to a better understanding of these phenomena and their role driving the weather and climate of our world.
The application of machine learning in the physical sciences has increased exponentially in recent years but with important avenues still largely unexplored. In climate modeling, deep neural networks have been repurposed to emulate the largescale consequences of stormlevel heating and moistening over the atmospheric column to replicate mean climate and expected precipitation patterns and extremes
[7, 8, 6, 9, 10]. However, much of this work has been confined to deterministic neural networks that ignore the interesting stochastic details of eddy and storm organization. The recent application of Generative Adversarial Networks (GANs, [11]) to the Lorenz ’96 Model suggests a potential, underexplored role for generative models in atmospheric sciences particularly towards stochastic parameterizations [12, 13]. There have also been initial successes using various types of GAN architectures to generate plausible RayleighBernard convection. In particular, adding informed physical constraints to GAN loss functions seem to improve the generation of these nonlinear fluid flow systems
[14, 15, 16, 17]. While promising, such techniques have thus far been restricted to reduced dimension and complexity situations of idealized turbulence; there is ample room to explore generative modeling methods for representing convective details amidst settings of realistic geographic complexity. Meanwhile, generative modeling besides GANs have not been as thoroughly considered for turbulent flow emulation and could potentially power climate models down the line.VAEs may prove more appropriate than GANs for these climate applications given their design containing both a generative and representational model, their often superior loglikelihoods and reconstruction simulations, and practical advantages including stabler training results, easier performance benchmarking, and more interpretable latent manifold representations [18, 19, 20]. Modified VAEs can reconstruct plausible twodimensional laminar flow with computational efficiency beyond what is common when numerically solving linear differential equations [21]. There has been preliminary work using VAEs for the clustering of atmospheric dynamics  a gain again relying on simplified Lorenz ’96 model data as well as potential vorticity fields and geopotential heights [22, 23]. This application of representation learning across a variety of simplified simulations suggests VAEs offer great potential as both an engineering tool to help escape computational limits on the generative side and may provide the ability to learn and extract high levels features of atmospheric dynamics on the representation side. However, to the best of our knowledge, this is the first study to use a VAE for representational learning on the details of convective organization and associated gravity wave radiation as revealed by spatial snapshots of vertical velocity  an inherently chaotic and bimodal variable [24]
 across a dataset large enough to nonetheless encompass the spatiotemporal diversity of turbulence regimes in the atmosphere. As far as we know, this is also the first study to constrain a VAE’s output statistics by adding a soft constraint term to its loss function to improve representation and capture variance details at small spatial scales in the turbulent atmospheric boundary layer which can be considered one of the most difficult locations for climate models. Our results demonstrate the power of VAEs to accurately reconstruct highresolution climate data, even when capturing stochasticity and variance are prerequisites to useful representation learning, as well as the VAE ability to leverage dimensionality reduction for high level feature learning and anomaly detection. This has direct implications for a potential future of VAEassisted dynamical analysis or VAEbased stochastic parameterization methods informed by cloudresolving model data.
Ii Method
In this Section, we discuss the architecture of the three machinelearning models used here, the design of our custom VAE loss function, and the generation and preprocessing of the atmospheric simulation data.
Iia Architecture
Layer  Filters  Kernel  Stride  Activation 

2D Conv  64  3x3  2  relu 
2D Conv  128  3x3  2  relu 
2D Conv  512  3x3  2  relu 
2D Conv ()  64  3x3  2  relu 
2D Conv ()  64  3x3  2  relu 
Layer  Filters  Kernel  Stride  Activation 

2D ConvT  1024  3x3  2  relu 
2D ConvT  256  3x3  2  relu 
2D ConvT  64  3x3  2  relu 
2D Conv ()  1  3x3  2  sigmoid 
2D Conv ()  1  3x3  2  linear 
Our VAE takes vertical velocity fields formatted as (30128) 2D images. We adopt a fully convolutional design^{1}^{1}1Earlier experiments used architecture similar to models used for cifar 10 data [25] with fully connected dense layers separating the encoder and the decoder from the latent space, but led to discouraging reconstructions plagued by posterior collapse and an inability to represent the spatial patterns of convection. to preserve local information, which is essential in atmospheric convection modeling (Tables I and II). We obtain meaningful reconstruction performance by ensuring that the information bottleneck in the VAE is not too severe, i.e. that the latent space is still wide enough to preserve enough fine features of the vertical velocity fields (in our case of dimension 1024), and by implementing annealing techniques outlined in [26, 27]. Here, we analyze two successful VAEs: One with a traditional negative ELBO in the loss, and one with an additional covariance constraint in the loss. As a baseline, we also implemented an autoencoder of the same design as above, with two key differences: All activations were replaced with the identity function and our custom loss was replaced with the meansquared error. We refer to this model as the “linear” model, and use it to better quantify the added value of VAEs for modeling atmospheric convection.
IiB VAE Loss Implementation
The total loss is the sum of two terms: the negative of the Evidence Lower Bound (ELBO), commonly used as the total VAE loss, and a soft constraint loss term ([21, 16, 28]) on the covariance matrix that we weigh by :
(1) 
Unconstrained VAEs (), henceforth referred to as “VAE” for short, maximize the ELBO, defined as the sum of the loglikelihood of the true posterior distribution , and the KL Divergence between
and the estimated posterior
:(2)  
where, in our model, x refers to observed vertical velocity fields, are our model parameters which are learned jointly with the variational parameters, . We denote hidden variables as z. Minimizing the KL loss term regularizes the variational parameters in the model and makes the VAE posterior more similar to the VAE prior leading to a less deterministic design. Maximizing the loglikelihood enables the VAE to produce vertical velocity fields that are more deterministic as the output will be more closely aligned with the latent variable of the model. Following [29], we assume that the prior over the parameters and the hidden variables are both centered isotropic Gaussian and calculate ELBO using equation (24) of [29].
To control the ratedistortion tradeoff [27], we implement linear annealing to the KL loss term following [30], where the KL term is multiplied by an annealing factor linearly scaled from 0 to 1 over the course of training. In our VAE, linear annealing results in significantly lower KL losses and a more separated and interpretable latent space.
Finally, to generate vertical velocity fields with realistic spatial scale variability, we additionally implement covarianceconstrained VAEs. Following [15], the soft constraint is defined as the Frobenius norm of the covariance matrix error, which we estimate over each batch during optimization. We choose a prefactor so that the magnitude of the soft constraint matches that of the reconstruction loss, resulting in a covarianceconstrained VAE “CCVAE” that generates more faithful covariance matrices.
IiC Data
IiC1 CloudResolving Data
To train and test our VAE, we rely on snapshots of vertical motions with explicitlyresolved moist convection and gravity wave radiation obtained from 15k instances of a CloudResolving Model (CRM) [31, 32] embedded within a host Global Climate Model (GCM). The CRMs operates at a 20s native timestep data and we extract state snapshots from it every 15 minutes, the frequency with which its horizontal average state is permitted to interact with its host GCM. We perform a 100day multiscale climate simulation to generate data showing details of atmospheric convection within a tropical belt from 20N to 20S latitudes. Specifically, at each horizontal grid cell of the SuperParameterized Community Atmosphere Model (SPCAM5), we embed a 128column SAM micro model with kilometer scale horizontal resolution; both the host and embedded models use 30 vertical levels. This entire dataset comes to a size of 1.3 Tb. For our purposes, there is 30 level by 128 CRMcolumn "snapshot" or "image" of a convectivescale vertical velocity field at each latitudelongitude grid cell that we feed into the encoder of our neural network. We train our VAEs on subsamples of this data staged on UC Irvine’s GreenPlanet Supercomputing node and our machine learning simulations are powered by two NVIDIA Tesla V100 and one NVIDIA Tesla T4 GPUs.
IiC2 Preprocessing
To reduce data volume for efficient training and to ensure our VAE is exposed to a plethora of convective motion, we selectively sample from the initial 1.3T SAM dataset. We restrict our initial data volume to the 144 latitude/longitude coordinates with a detectable diurnal cycle of precipitation whose amplitude of daily precipitation is greater than two times its standard deviation within the largerscale host model. This precipitation filtering ensures samples of strong convection get placed into the training dataset, as a persistent diurnal cycle of precipitation often indicates deep convection and the presence of mesoscale convective systems
[33]. Within these selected grid cells, the vertical velocity values range from toand are then scaled from 01 by subtracting the mean and dividing by the range. Data are shuffled in the spatial and temporal dimensions prior to training. An 80/20 training/test split is used for all models. To ensure a balanced dataset of different convective types, we apply Kmeans clustering with two centroids to group data with active and inactive vertical velocity fields. We then sample equally from both clusters without replacement to design a balanced dataset for the VAE. This new 4.3Gb dataset has a 111206/27802 training/test split. Since the horizontal domain is doublyperiodic, two vertical velocity updrafts of equal magnitudes and size located at different horizontal locations are physically identical. To prevent the VAE from treating them as different at the expense of reconstruction magnitude and variance, we preprocess all samples so that the center of the vertical velocity field is the location of strongest convection present in the sample. We define this as the largest absolute value of spatiallyaveraged vertical velocity, from 400hPa to 600hPa in the vertical and using a moving average of 10km horizontally.
IiD Quantifying Performance
We quantify the reconstructions of our final VAE and CC VAE as well as our linear baseline using the following metrics:
IiD1 Hellinger Distance
We calculate the Hellinger distance H between the discrete distributions to gauge similarity [34]:
(3) 
where is the distribution of the original vertical velocity fields and is the distribution of the corresponding reconstruction.
IiD2 Mean Squared Error (MSE)
To provide an overall skill of the reconstruction, the MSE is calculated between each original sample and its corresponding reconstruction.
IiD3 Spectral Analysis
To better understand the skill of the VAE reconstruction from a spatial perspective, we perform onedimensional spectral analysis on each sample and reconstruction at all 30 levels in the vertical dimension. We examine four vertical levels commonly used in meteorology: 850hPa (top of the boundary layer), 700hPa (lower troposphere), 500hPa (midtroposphere), and 250hPa (uppertroposphere) to see how our VAEs capture the spatiallyresolved vertical velocity variance throughout the atmosphere. We calculate the power spectral density using:
(4) 
where N is the length of the x dimension, is the sample or reconstruction, T is 1/length and k is the vertical level of interest in hPa (850, 700, 500, or 250) [35].
Iii Results
Model  MSE  Hellinger Distance  Frobenius Norm 

Linear  4.2e6  2.0e3  8.0e3 
VAE  1.1e5  3.1e4  3.2e4 
CC VAE  4.5e6  2.0e3  8.0e6 
This first application of a VAE on more realistic climate data with a novel covariance error constraint term added to the loss function is successful by our qualitative visualizations including reconstructions and latent space projections as well as quantitative benchmarks outlined in the previous section. With the proper training dataset and convolutional architecture both the VAE and the CC VAE learn remarkably skillful reconstructions of any type of convection from the test dataset. The VAEs capture the magnitude, proper height, and structure across deep convective regimes, shallow convective regimes and locations where there is little convective activity in the vertical velocity field to detect (Figure 4). When the “Covariance Constraining” term is added to the loss the CC VAE performance improves enough to approximately match our linear baseline (Table III). But unlike many other image recognition tasks generative models perform, reconstructing the mean is necessary but not sufficient  we care about the variance and correlation in the vertical velocity fields. The CC VAE captures the variance better than the linear baseline both quantitatively as evidenced by a lower Frobenius Norm on the error of its covariance(Table III), and visually by plotting the power spectrum of the vertical velocities at different levels in the atmospheric column( Figure 2). Comparing the power spectra of the three models, the CC VAE is the overall best across different atmospheric levels and different spatial scales (Figure 2). The CC VAEs performance suggests an ability to capture the structure of convection in areas of high stochasticity near the atmospheric boundary layer characteristic of shallow convection as well as in locations towards the upper troposphere where deep convective regimes dominate. The CC VAEs high skill across both small and large spatial scales demonstrates an ability to get both the overall pattern of convective plumes and the details that compose them. Capturing this both the magnitude of convection but also variance of the overall pattern of convective plumes and the details that compose them is where the CC VAE begins to show benefits a linear model does not provide.
Furthermore we have confirmed the latent space of the VAEs to be both a credible and meaningful method for representation learning via dimensionality reduction and feature extraction to learn the details of convective organization. A 2D projection of the latent space shows clustering and organization of different convective types in its structure (Figure
1). In particular there is excellent separation of deep and shallow convective samples from nonconvective samples (Figure 1, please visit this link for a complete animation of the 2D Projection of the latent space). The physical knowledge gained by the VAE that is represented in the latent space stands in stark contrast to other baseline forms of dimenionality reduction where there is not the same clear evidence of separation by convective classification. However, not only do these CC VAE predictions of convective type organize in a distinguishable pattern in the latent space, these predictions also map back in a physically sensible pattern over the tropics for Boreal Winter with Deep Convection concentrated on land over the Amazon and African Rainforests as well as over the Pacific Warmpool (Figure 3). When the test dataset is exclusively restricted to an Amazon Diurnal Composite, the known coherent transition from shallow to deep convection that occurs over tropical rainforest in response to solar heating correspond to monotonic trajectories in the latent space projection (Figure 5). This physically meaningful latent space is supported further by the sensitivity of classification of convection type over tropical Rain Forests to time of day when the diurnal composite is analyzed (Figure 3, please visit this link for a complete animation of the tropical diurnal cycle). It is not yet entirely clear if other more complex convective transitions would map for a physically meaningful and interpretable latent space representation but these initial positive results suggest great potential for VAEs as a tool in atmospheric dynamics to uncover information about convective transitions, storm morphology and propagation in the future.Our VAEs ELBO provides a versatile method of organising the types of convection present in the data. This natural ability to detect anomalies in the vertical velocity data proves to be an elegant way to identify deep convection in a more thorough manner than traditional vertical velocity thresholding. An example of one such anomaly that can be identified is Figure 6  in this case an instance of two moderate storms developing in one CRM array which is a phenomena that would be harder to find through traditional methods. The VAEs attribute of anomaly detection enables us to learn the characteristics of the data instead of naively thresholding based on priors that may not relfect the data contents. This feature provides the potential to help identify interesting and unexpected weather phenomena from noise  artifacts that might otherwise never be found in overwhelmingly large and complex datasets.
Beyond the proven benefits from the VAEs understanding of complex convective patterns, the immaculate reconstructions our models produce from a complex and diverse dataset suggest the other sideo f the VAE, the generational model, could have utility for computationally efficient and accurate stochastic parameterization (Figure 4). But there is much work to be done before a VAE could be implemented to power stochastic parameterizations for a climate model. The VAE architecture would likely need to be upgraded to a conditional structure. Although the ability to quickly and efficiently generate synthetic, detailed vertical velocity fields for study would be a valuable resource for the climate and meteorology communities, any improvements in the generative capabilities would likely come at the expense of the representation learning and the VAEs diagnosis of the physics of convection which we feel is currently a promising avenue for the broader application of generative modeling for advancing the field of atmospheric dynamics. [27, 26].
References
 [1] D. Randall, M. Khairoutdinov, A. Arakawa, and W. Grabowski, “Breaking the cloud parameterization deadlock,” Bulletin of the American Meteorological Society, vol. 84, no. 11, pp. 1547–1564, 2003.
 [2] T. R. Jones, D. A. Randall, and M. D. Branson, “Multipleinstance superparameterization: 1. concept, and predictability of precipitation,” Journal of Advances in Modeling Earth Systems, vol. 11, no. 11, pp. 3497–3520, 2019.
 [3] T. Schneider, J. Teixeira, C. Bretherton, F. Brient, K. Pressel, C. Schär, and A. Siebesma, “Climate goals and computing the future of clouds,” Nature Climate Change, vol. 7, pp. 3–5, 01 2017.
 [4] E. J. Jensen, G. Diskin, R. P. Lawson, S. Lance, T. P. Bui, D. Hlavka, M. McGill, L. Pfister, O. B. Toon, and R. Gao, “Ice nucleation and dehydration in the tropical tropopause layer,” Proceedings of the National Academy of Sciences, vol. 110, no. 6, pp. 2041–2046, 2013.
 [5] H. Kalesse and P. Kollias, “Climatology of high cloud dynamics using profiling arm doppler radar observations,” Journal of Climate, vol. 26, no. 17, pp. 6340–6359, 2013.
 [6] B. Medeiros, B. Stevens, and S. Bony, “Using aquaplanets to understand the robust responses of comprehensive climate models to forcing,” Climate Dynamics, vol. 44, no. 7, pp. 1957–1977, 2015.
 [7] S. Rasp, M. S. Pritchard, and P. Gentine, “Deep learning to represent subgrid processes in climate models,” Proceedings of the National Academy of Sciences, vol. 115, no. 39, pp. 9684–9689, 2018.
 [8] P. Gentine, M. Pritchard, S. Rasp, G. Reinaudi, and G. Yacalis, “Could machine learning break the convection parameterization deadlock?,” Geophysical Research Letters, vol. 45, no. 11, pp. 5742–5751, 2018.
 [9] S. K. Müller, E. Manzini, M. Giorgetta, K. Sato, and T. Nasuno, “Convectively generated gravity waves in high resolution models of tropical dynamics,” Journal of Advances in Modeling Earth Systems, vol. 10, no. 10, pp. 2564–2588, 2018.
 [10] P. A. O’Gorman and J. G. Dwyer, “Using machine learning to parameterize moist convection: Potential for modeling of climate, climate change, and extreme events,” Journal of Advances in Modeling Earth Systems, vol. 10, no. 10, pp. 2548–2563, 2018.
 [11] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, pp. 2672–2680, 2014.
 [12] D. J. Gagne II, H. M. Christensen, A. C. Subramanian, and A. H. Monahan, “Machine learning for stochastic parameterization: Generative adversarial networks in the lorenz ’96 model,” Journal of Advances in Modeling Earth Systems, vol. 12, no. 3, p. e2019MS001896, 2020. e2019MS001896 10.1029/2019MS001896.
 [13] D. Crommelin and W. Edeling, “Resampling with neural networks for stochastic parameterization in multiscale systems,” 04 2020.
 [14] L. Yang, D. Zhang, and G. E. Karniadakis, “Physicsinformed generative adversarial networks for stochastic differential equations,” SIAM J. Scientific Computing, vol. 42, pp. A292–A317, 2020.
 [15] J.L. Wu, K. Kashinath, A. Albert, D. Chirila, Prabhat, and H. Xiao, “Enforcing statistical constraints in generative adversarial networks for modeling chaotic dynamical systems,” Journal of Computational Physics, vol. 406, p. 109209, 2020.

[16]
P. Stinis, T. Hagge, A. Tartakovsky, and E. Yeung, “Enforcing constraints for interpolation and extrapolation in generative adversarial networks,”
Journal of Computational Physics, 03 2018.  [17] Z. Yang and H. Xiao, “Enforcing deterministic constraints on generative adversarial networks for emulating physical systems,” 11 2019.
 [18] Y. Wu, Y. Burda, R. Salakhutdinov, and R. Grosse, “On the quantitative analysis of decoderbased generative models,” 11 2016.
 [19] L. Mescheder, S. Nowozin, and A. Geiger, “Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks,” in Proceedings of the 34th International Conference on Machine Learning  Volume 70, ICML’17, p. 2391–2400, JMLR.org, 2017.
 [20] H. Huang, Z. Li, R. He, Z. Sun, and T. Tan, “Introvae: Introspective variational autoencoders for photographic image synthesis,” 07 2018.
 [21] S. Eismann, S. Bartzsch, and S. Ermon, “Shape optimization in laminar flow with a labelguided variational autoencoder,” 12 2017.
 [22] X.A. Tibau Alberdi, C. RequenaMesa, C. Reimers, J. Denzler, V. Eyring, M. Reichstein, and J. Runge, “Supernovae : Vae based kernel pca for analysis of spatiotemporal earth data,” 01 2018.
 [23] M. Krinitskiy, Y. Zyulyaeva, and S. Gulev, “Clustering of polar vortex states using convolutional autoencoders,” 09 2019.
 [24] V. P. Ghate, B. A. Albrecht, and P. Kollias, “Vertical velocity structure of nonprecipitating continental boundary layer stratocumulus clouds,” Journal of Geophysical Research: Atmospheres, vol. 115, no. D13, 2010.
 [25] A. Krizhevsky, V. Nair, and G. Hinton, “Cifar10 (canadian institute for advanced research),”
 [26] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner, “betavae: Learning basic visual concepts with a constrained variational framework,” in ICLR, 2017.
 [27] A. A. Alemi, B. Poole, I. S. Fischer, J. V. Dillon, R. A. Saurous, and K. Murphy, “Fixing a broken elbo,” in ICML, 2018.
 [28] “Deep learning for universal linear embeddings of nonlinear dynamics,” Nature Communications, vol. 9, no. 1, p. 4950, 2018.
 [29] D. P. Kingma and M. Welling, “Autoencoding variational bayes,” CoRR, vol. abs/1312.6114, 2014.
 [30] S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Józefowicz, and S. Bengio, “Generating sentences from a continuous space,” in CoNLL, 2016.
 [31] M. Khairoutdinov and D. Randall, “Cloud resolving modeling of the arm summer 1997 iop: Model formulation, results, uncertainties, and sensitivities,” Journal of The Atmospheric Sciences  J ATMOS SCI, vol. 60, pp. 607–625, 02 2003.
 [32] M. Khairoutdinov and Y. L. Kogan, “A large eddy simulation model with explicit microphysics: Validation against aircraft observations of a stratocumulustopped boundary layer,” 1999.
 [33] A. Clark, W. Gallus, and T.C. Chen, “Comparison of the diurnal precipitation cycle in convectionresolving and nonconvectionresolving mesoscale models,” Monthly Weather Review  MON WEATHER REV, vol. 135, 10 2007.
 [34] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
 [35] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,” 1965.
 [36] L. van der Maaten and G. E. Hinton, “Visualizing data using tsne,” 2008.
Comments
There are no comments yet.