A comparative study of stochastic and deep generative models for multisite precipitation synthesis

by   Jorge Guevara, et al.

Future climate change scenarios are usually hypothesized using simulations from weather generators. However, there only a few works comparing and evaluating promising deep learning models for weather generation against classical approaches. This study shows preliminary results making such evaluations for the multisite precipitation synthesis task. We compared two open-source weather generators: IBMWeathergen (an extension of the Weathergen library) and RGeneratePrec, and two deep generative models: GAN and VAE, on a variety of metrics. Our preliminary results can serve as a guide for improving the design of deep learning architectures and algorithms for the multisite precipitation synthesis task.



There are no comments yet.


page 3


A Systematic Survey on Deep Generative Models for Graph Generation

Graphs are important data representations for describing objects and the...

On Memorization in Probabilistic Deep Generative Models

Recent advances in deep generative models have led to impressive results...

Deep generative modeling for probabilistic forecasting in power systems

Greater direct electrification of end-use sectors with a higher share of...

Pixyz: a library for developing deep generative models

With the recent rapid progress in the study of deep generative models (D...

Sample-Efficient Generation of Novel Photo-acid Generator Molecules using a Deep Generative Model

Photo-acid generators (PAGs) are compounds that release acids (H^+ ions)...

Detecting Overfitting of Deep Generative Networks via Latent Recovery

State of the art deep generative networks are capable of producing image...

Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis

The rapid advances in deep generative models over the past years have le...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Stochastic weather generators are a common statistical downscaling tool that explicitly utilize the probabilistic nature of physical phenomena to model the marginal, temporal and sometimes spatial aspects of meteorological variables. They were first conceptualized by (Richardson, 1981) and have since become widely used to produce long surrogate time series and downscale future climate projections for climate impact assessments (e.g. Kilsby et al. (2007)). They remain in wide use today (e.g., Vesely and others (2019)).

Stochastic weather generation poses a number of unique challenges and have received recent attention from the machine learning community (e.g., Li et al. (2021), Puchko et al. (2020)). For example, the data being modeled can be highly-imbalanced, contain spatio-temporal dependencies and exhibit various anomalies – e,g., extreme weather events – exacerbated by anthropogenic climate change.

Motivated by the absence of work comparing and evaluating stochastic and deep generative weather generators, we hereby perform a systematic evaluation of four weather generators for multisite precipitation synthesis: two open-source stochastic weather generators – the IBMWeathergen (an extension of the weathergen library; ) and RGeneratePrec; and two deep generative models based on GAN and VAE architectures. The four weather generators are evaluated for Palghar, India which experiences heavy rainfall during the southwestern summer monsoons from July through September. This provides a challenging, highly-imbalanced precipitation dataset for synthetic generation. We used several metrics commonly used in literature to compare the empirical distribution of the simulations and observations and different patterns found in data like dry and well counts, dry and well spell lengths, total annual/monthly precipitation, and wet counts (Mehan et al., 2017; Tseng et al., 2020; Mehrotra et al., 2006; Semenov et al., 1998).

2 Data and Methods

2.1 Palghar Moonson Dataset

Daily precipitation data for Palghar, India is from the Climate Hazards Group Infrared Precipitation with Stations v2.0 (CHIRPS) dataset. It contains global interpolated daily precipitation values at a spatial resolution of 0.05

. We constructed a dataset for training the weather generators by gathering the daily precipitation data from CHIRPS from the period 01/01/1981 to 31/12/2009 within a bounding box corresponding to the latitude longitude pairs: 19 N, 72E and 20 N, 73E. The bounding box contains 400 latitude and longitude pairs (sites) with precipitation values.

2.2 Weather generators

2.2.1 IBMWeathergen

We customized the weathergen singlesite library to perform multisite precipitation generation. Our implementation follows the methodology described in (Apipattanavis et al., 2007) and includes an ARIMA forecasting component as in (Steinschneider and Brown, 2012)

. The occurrence model uses a first-order homogeneous Markov chain per month with three sequence states (dry, wet, and extreme). An ARIMA model captures the low-frequency trend of the interannual variability of the annual precipitation. For the precipitation model, IBMWeathergen uses a KNN 1-lag bootstrap resampler

1111-lag refers to the resampling process being constrained to the sequence of two consecutive days given by the first-order Markov chain

and a KDE estimator. The model has extrapolation capabilities given by the ARIMA component and spatial coherence is guaranteed through the use of the resampling technique

(Apipattanavis et al., 2007).

2.2.2 RGeneratePrec

The RGeneratePrec model models temporal occurrence using a heterogeneous Markov chain per month with a probability transition matrix estimated through Generalized linear models with the logit link function. The multisite precipitation occurrence follows Wilks’ approach

(Wilks, 1998), which estimates binary states of precipitation amounts for each site as a function of the probability integral transform of Gaussian random numbers constrained to the probability transition matrix of the temporal occurrence model. The precipitation amount is generated for the corresponding states using a copula model based on a non-parametric distribution of the monthly observed samples (Cordano et al., 2016).

2.2.3 Vae (Kingma and Welling, 2014)

We used an encoder that gets input data with two convolution blocks followed by a bottleneck dense layer and two dense layers for optimizing and

that hold the latent space that is sampled to derive a normally distributed

. We reduce the input dimension by four before submitting the outcome to the bottleneck dense layer using a down-sampling stage per convolutional block. We applied RELU after the convolutional and dense layers. Input

goes into the decoder and into a dense layer to be reshaped into 256 activation maps of size . These maps are inputs to consecutive transposed convolution layers that up-sampling the data up to the original size. A final convolution using one filter is applied to get the outcome.

2.2.4 Gan (Goodfellow et al., 2014)

We used similar architectures. The generator’s encoder receives input data and applies two convolution blocks followed by a bottleneck dense layer. The decoder receives the encoder output and feeds it to a dense layer to be reshaped into 256 activation maps with a size of . These maps serve as input to consecutive transposed convolution layers that up-sampling the data up to the original size. The discriminator network uses an encoder architecture with a classification layer to implement the discrimination loss used to train the generator network.

3 Preliminary results

We use the IBMWeathergen and the RGeneratedPrec to generate 50 simulations for each of the 29 years of the dataset within the described bounding box. For the VAE and the GAN models, we generated 32 representative days of the monsoon period for the bounding box in analysis222This approach of generating 32 days for representing the monsoon period was due to the scarcity of data for training this kind of model.

Figure 1: QQ-plot of observed vs. simulated precipitation values.

Figure 1 shows a comparison of the empirical distributions of observed and simulated values in terms of QQ-plots without considering the spatial locations and time of the year. We observed that up to 100 mm/day, the IBMWeathergen and the RGeneratePrec models perform similarly. From the DL side, the VAE follows the diagonal line closely, and the GAN fails to have a good representation of the distribution. Also, at dry observed days (0 mm/day) both VAE and GAN overestimate the wet days.

Figure 2: Comparison of observed and simulated precipitation values in terms of several quantitative measurements.

We investigated the weather generators’ simulated distribution in more detail as a function of several quantitative measurements without considering the spatial locations and time of the year. Figure 2

shows this comparison in terms of the moments (mean, standard deviation


We are using the standard deviation instead of the variance because of intepretability

, skewness and kurtosis) and four quantitative measurements (coefficient of variation, wet counts, dry counts, and maximum values). In these results, the IBMWeathergen and the RGeneratePrec simulations represent the observed moments and quantitative measurements (dashed blue line). The GAN and the VAE models have a good approximation of the skewness, however they overestimate the mean, kurtosis, wet counts, and maximum values and underestimate the coefficient of variation and the dry counts.

Figure 3: Comparison among empirical distributions of quantitative measures per simulation

We performed the same analysis as above in the following experiment, although the moments and quantitative measurements were computed per simulation. Each point within the (Fig. 3) corresponds to a moment or quantitative measurement estimated from the precipitation values from individual simulations (without considering the spatial information and the time of occurrence). The dashed blue line represents the quantitative measures of observed precipitation values. The results show that IBMWeathergen and RGeneratePrec have a better representation of those metrics than the DL models. The IBMWeathergen underestimates the maximum values, and it has more spread in representing the skewness and kurtosis than the RGeneratePrec. On the other hand, the RGeneratePrec slightly underestimates the observed mean and standard deviation. GAN and VAE overestimate or underestimate all the metrics. VAE has a wider spread for skewness, kurtosis, and maximum values.

Figure 4: QQ-plots of dry and wet spell lengths of observed and simulated precipitation values.

Another experiment was to investigate if the weather generators could simulate the dry and wet spell length frequencies from the observed data. Figure 4 shows this comparison in terms of QQ-plots. The results show that IBMWeathergen and RGeneratePrec can reproduce up to forty days of consecutive dry days found in the observed data. These two stochastic generators can also properly simulate the consecutive number of wet days found in the observations. On the other hand, GAN and VAE models fail to reproduce this information in the simulations.

Figure 5: QQ-plots of the mean, standard deviation and maximum value per day: observed vs simulated.

One way to validate the simulations’ temporal coherence is to analyze the simulated data at the day, month, and annual levels. Figure 5 shows a comparison of the distributions of the means, standard deviation, and maximum values per simulation day contrasted with the observed values. The results indicate that IBMWeathergen is better at representing those metrics followed by the VAE approach, while RGeneratePrec and GAN fail in simulating these metrics per day.

Figure 6: Means of the monthly total precipitation across the sites.

Figure 7: Means of the monthly wet counts across the sites.

We explored the means of the monthly total precipitation and wet counts across the sites at the monthly level. Black points and lines in Figure 6 represent the means of the monthly total precipitation of observed values across the sites. Similarly, the blue points and lines are the medians of the monthly total simulated precipitation means. The limits of the gray area are the maximum and minimum of the monthly total simulated precipitation means. We observed that IBMWeathergen and RGeneratePrec simulations follow the observed monthly totals, with IBMWeathergen showing more variability. GAN overestimates the monthly total precipitation means. However, VAE shows promising results. It follows the monthly total precipitation means closely (except for May), and even it presents more variability, represented by a wider shade area, than the classical stochastic weather generators. Figure 7 shows a similar experiment but in terms of percentage of the wet counts instead of precipitation. IBMWeathergen and RGeneratePrec successfully simulate this information whereas GAN and VAE overestimate the monthly wet counts.

Figure 8: Means of the annual total precipitation across the sites.

Figure 9: Means of the annual wet counts across the sites.

Finally, we explore whether or not the weather generators can reproduce the total annual precipitation and wet counts. Black points and lines in Fig. 8 display the means of the total annual precipitation across all sites. The gray area identifies the limits of the means of the simulated total annual precipitation. Blue points and lines represent the medians, and the gray area limits are the maximum and minimum of the total annual simulated precipitation values across the sites. In this experiment, only IBMWeathergen can simulate the interannual variability while RGeneratePrec follows a linear trend pattern. As GAN and VAE models were not trained on specific years, they cannot distinguish the total annual variability. Figure 9 shows the annual totals for GAN and VAE as reference, which overestimate the observed total annual precipitation. Figure 9 shows a similar experiment but in terms of percentage of wet counts.

4 Discussion

In this preliminary study, the IBMWeathergen model was consistently the best simulator for capturing different aspects of the observed precipitation values during the monsoon period in Palghar, India. However, there are other aspects we did not validate, including the superresolution capability of these generators for generating weather fields. (We hypothesize that the DL models can be better in this aspect, and we leave it as future research.) Deep learning applications in this realm are still immature. We hypothesize that it is possible to improve the design of weather generators based on deep learning methodologies by considering the metrics presented in this paper and others reported in the literature in the creation of loss functions, architectures, and algorithms

444Research in stochastic weather generators is about 40 years old. The literature reports several methodologies for constructing them. However, there is still a lack of open source libraries and APIs ready for customization.. For instance, open research questions are: How to constrain DL models to follow specific patterns found in data (e.g., dry/wet spell statistics)? How to couple DL models with temporal modeling concerning the annual and monthly variability? How to add control capability to deep learning models for generating extreme scenarios (extreme rainfalls, long dry/wet spells, etc.)? How to condition the models to forecasting values? and so on.


  • S. Apipattanavis, G. Podestá, B. Rajagopalan, and R. W. Katz (2007) A semiparametric multivariate and multisite weather generator. Water Resources Research 43 (11). Cited by: §2.2.1.
  • E. Cordano, E. Eccel, et al. (2016) Tools for stochastic weather series generation in r environment. Ital J Agrometeorol 21, pp. 31–42. Cited by: §2.2.2.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (Eds.), Vol. 27, pp. . External Links: Link Cited by: §2.2.4.
  • C. Kilsby, P. Jones, A. Burton, A. Ford, H. Fowler, C. Harpham, P. James, A. Smith, and R. Wilby (2007) A daily weather generator for use in climate change studies. Environ. Model. Softw. 22, pp. 1705–1719. Cited by: §1.
  • D. P. Kingma and M. Welling (2014) Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, External Links: http://arxiv.org/abs/1312.6114v10 Cited by: §2.2.3.
  • X. Li, K. Kou, and B. Zhao (2021)

    Weather gan: multi-domain weather translation using generative adversarial networks

    External Links: 2103.05422 Cited by: §1.
  • S. Mehan, T. Guo, M. W. Gitau, and D. C. Flanagan (2017) Comparative study of different stochastic weather generators for long-term climate data simulation. Climate 5 (2), pp. 26. Cited by: §1.
  • R. Mehrotra, R. Srikanthan, and A. Sharma (2006) A comparison of three stochastic multi-site precipitation occurrence generators. Journal of Hydrology 331 (1-2), pp. 280–292. Cited by: §1.
  • A. Puchko, R. Link, B. Hutchinson, B. Kravitz, and A. Snyder (2020) DeepClimGAN: A high-resolution climate data generator. CoRR abs/2011.11705. External Links: Link, 2011.11705 Cited by: §1.
  • C. W. Richardson (1981) Stochastic simulation of daily precipitation, temperature, and solar radiation. Water Resources Research 17 (1), pp. 182–190. External Links: Document Cited by: §1.
  • M. A. Semenov, R. J. Brooks, E. M. Barrow, and C. W. Richardson (1998) Comparison of the wgen and lars-wg stochastic weather generators for diverse climates. Climate research 10 (2), pp. 95–107. Cited by: §1.
  • S. Steinschneider and C. Brown (2012) A semiparametric multivariate and multi-site weather generator with a low-frequency variability component for use in bottom-up, risk-based climate change assessments. In AGU Fall Meeting Abstracts, Vol. 2012, pp. GC41B–0973. Cited by: §2.2.1.
  • S. Tseng, C. Chen, and S. U. Senarath (2020) Evaluation of multi-site precipitation generators across scales. International Journal of Climatology 40 (10), pp. 4622–4637. Cited by: §1.
  • F. Vesely et al. (2019) Quantifying uncertainty due to stochastic weather generators in climate change impact studies. Sci. Rep. 9, pp. 9258. Cited by: §1.
  • D. Wilks (1998) Multisite generalization of a daily stochastic precipitation generation model. journal of Hydrology 210 (1-4), pp. 178–191. Cited by: §2.2.2.