Filling the gap between GRACE- and GRACE-FO-derived terrestrial water storage anomalies with Bayesian convolutional neural networks

by   Shaoxing Mo, et al.
Nanjing University

There is an approximately one-year observation gap of terrestrial water storage anomalies (TWSAs) between the Gravity Recovery and Climate Experiment (GRACE) satellite and its successor GRACE Follow-On (GRACE-FO). This poses a challenge for water resources management, as discontinuity in the TWSA observations may introduce significant biases and uncertainties in hydrological model predictions and consequently mislead decision making. To tackle this challenge, a Bayesian convolutional neural network (BCNN) is proposed in this study to bridge this gap using climatic data as inputs. Enhanced by integrating recent advances in deep learning, BCNN can efficiently extract important features for TWSA predictions from multi-source input data. The predicted TWSAs are compared to the hydrological model outputs and three recent TWSA prediction products. Results suggest the superior performance of BCNN in bridging the gap. The extreme dry and wet events during the gap period are also successfully identified by BCNN.


page 6

page 8

page 9

page 17

page 18

page 19

page 21


Combining Physically-Based Modeling and Deep Learning for Fusing GRACE Satellite Data: Can We Learn from Mismatch?

Global hydrological and land surface models are increasingly used for tr...

A Deep and Wide Neural Network-based Model for Rajasthan Summer Monsoon Rainfall (RSMR) Prediction

Importance of monsoon rainfall cannot be ignored as it affects round the...

Separation of Water and Fat Magnetic Resonance Imaging Signals Using Deep Learning with Convolutional Neural Networks

Purpose: A new method for magnetic resonance (MR) imaging water-fat sepa...

Identificación y Registro Catastral de Cuerpos de Agua mediante Técnicas de Procesamiento Digital de Imagenes

The effects of global climate change on Peruvian glaciers have brought a...

Data-driven Flood Emulation: Speeding up Urban Flood Predictions by Deep Convolutional Neural Networks

Computational complexity has been the bottleneck of applying physically-...

Prediction of Rainfall in Rajasthan, India using Deep and Wide Neural Network

Rainfall is a natural process which is of utmost importance in various a...

HydroNets: Leveraging River Structure for Hydrologic Modeling

Accurate and scalable hydrologic models are essential building blocks of...

Plain Language Summary

The remote sensing satellites Gravity Recovery and Climate Experiment (GRACE) provide valuable and accurate observations of terrestrial water storage changes at a global scale. These observations have been used widely for sustainable water resources management and understanding water cycle. However, there is an about one-year gap that the GRACE observations are missing due to a break-in period between two satellites. Filling this gap is thus of crucial significance for practical hydrological, agricultural, and ecological applications. We propose in this work a Bayesian convolutional neural network (BCNN) by leveraging recent advances in deep learning to bridge this gap based on climatic data. Results show that our BCNN achieves a state-of-the-art performance in terms of filling accuracy compared to previous studies.

1 Introduction

The Gravity Recovery and Climate Experiment (GRACE) satellite and its successor GRACE Follow-On (GRACE-FO) provide unprecedentedly accurate observations of the spatio-temporal dynamics of terrestrial water storage anomalies (TWSAs) at a global scale. These TWSA observations have been widely utilized together with hydrological models to assess water cycle, droughts and floods, the impacts of changing climate on terrestrial water storage, etc [Fengrs10050674, Gentine_2019, Rateb2020, Richey2015, Rodell2018Natur, SOLTANI2021, Tapley2019]. Particularly, their applications in groundwater-related studies (e.g., the groundwater depletion and land subsidence problems in North China Plain and California’s Central Valley) are more extensive due to the difficult-to-observe nature of groundwater systems, substantially augmenting our knowledge toward the systems and thus benefiting groundwater resources management [<]e.g.,¿Chang2020wrr,CHEN2014130,Famiglietti2011,Feng2013WRR,Fengrs10050674,Lietal2019WRR,Smith2020,Zhong_2018.

However, there is an approximately one-year gap of TWSA observations between the GRACE and GRACE-FO missions. Considering that the TWSA observations are usually assimilated in the hydrological models for higher reliability [Lietal2019WRR, SOLTANI2021, YIN2020125348, Zaitchik], discontinuity in the time series observations may introduce significant biases and uncertainties in model predictions and consequently mislead decision making [AlexSun2020]. Bridging this gap is thus of crucial importance for practical applications. There have been many efforts undertaken to predict/reconstruct GRACE TWSAs at regional or global scales using data-driven methods [<]e.g.,¿[]Ahmed_2019RS,Forootan2014,Forootanrs12101639,Humphrey2017GRL,Humphrey-2019-REC,Jing2020,LietalWRR2020,LONG2014145,Richter2021,AlexSun2019,AlexSun2020,SunZL2020WRR,WANG2021125972. While these studies have generally obtained relatively good performances in humid regions, the performance in arid and semiarid regions remains relatively poor (a global aridity index map is shown in Figure 4), calling for innovative solutions. The poor performance in arid/semiarid regions is mainly due to the difficulty in modeling the long-term TWSA trends caused by anthropogenic activities [Humphrey-2019-REC, LietalWRR2020, SunZL2020WRR].

Recent years have witnessed a rapid development of deep learning and its impressive performance in various applications [GU2018354, Reichstein2019Natur, Shen2018]. In this study, we propose a Bayesian convolutional neural network (BCNN) driven by climatic inputs to bridge the TWSA observation gap between GRACE and GRACE-FO at a global scale (excluding Antarctica). In this framework, the input climatic data and the target GRACE TWSAs are treated as images to leverage the excellent capability of convolutional neural networks (CNNs) in image processing [GU2018354, Mo2019_co2, Mo2019inverse, Mo2020, AlexSun2019]

. AlexSun2019 applied CNNs for prediction of TWSAs in India and the CNN outperformed the hydrological models in providing more accurate TWSA estimates. The development of our BCNN leverages and combines the recent advances in deep learning, including the residual 

[He_2016_CVPR] and dense connection [Huang_2017_CVPR] modules, channel and spatial attention mechanisms [Woo_2018_ECCV], Bayesian training strategy [LiuSVGD2016, ZHU2018415] and other [GU2018354]. By comparing with the hydorlogical model outputs and three recent TWSA prediction products, we will show that the combination of these strategies enables the network to automatically extract informative features from multi-source driving data, offer predictive uncertainties, and consequently achieve state-of-the-art performances in filling the gap.

2 Data and Processing


The monthly GRACE Mascon product released by the Jet Propulsion Laboratory (JPL) (available at, which has a spatial resolution of  [Watkins2015]

, is used in this study. The JPL GRACE TWSA data are provided as anomalies with respect to the 2004 to 2009 average. The observations cover two periods, that is, April 2002-June 2017 (i.e. the GRACE mission) and June 2018-present (i.e. the GRACE-FO mission), with a 11-month gap in between. In addition, there are some intermittent one or two months with missing TWSA data within each mission, these missing months are interpolated using the data of neighboring months. Our aim is to fill the 11-month gap (i.e. July 2017-May 2018) for the land areas with the BCNN method. To facilitate the comparison with previous GRACE prediction studies, we resampled averagely the data to


2.2 ERA5-Land Driving Data

The driving data used to predict the GRACE TWSAs are extracted from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5-Land (ERA5L) dataset (available at The ERA5L dataset contains an improved version of the land component of the ERA5 climate reanalysis, making it more accurate for land applications. The data are provided at a spatial resolution of

and cover a period from January 1981 to near present. The driving data contain four predictor variables, including the monthly precipitation, temperature, cumulative water storage changes (CWSCs), and ERA5L-derived TWSAs. The spatial resolution of these data is averagely resampled into to

to be consistent with GRACE TWSAs.

The water storage change (WSC) is calculated as the difference between the inflow (i.e. precipitation ) and outflows (i.e., evapotranspiration and runoff ) based on the water balance:


The CWSC is thus written as follows:


where denotes the month index. CWSC correlates well to GRACE TWSA in most regions as shown in Figure 5. Therefore, it is used as an additional predictor of GRACE TWSAs.

The ERA5L dataset includes water storage in soil moisture (in four layers spanning from 0 to 289 cm), snow, and canopy. Thus, the ERA5L TWSAs are calculated by summing these components and then subtracting the long-term mean between 2004 and 2009 to be consistent with the GRACE TWSAs, as represented by:


where SMS, SWS, and CWS are soil moisture storage, snow water storage, and canopy water storage, respectively, denotes mean of the three components during 2004-2009.

2.3 Time Series Data detrending

The GRACE TWSA time series in arid and semiarid regions often exhibits a long-term declining/rising trend caused by the human intervention and/or changing climate. This poses challenges for the TWSA prediction tasks, as the driving data (e.g., the climatic data or outputs of hydrological models) may not be able to well reflect the influences of these factors [Humphrey-2019-REC, LietalWRR2020, SunZL2020WRR]. For the 11-month gap (July 2017-May 2018) filling task considered here, fortunately, the GRACE TWSA data before (April 2002-June 2017) and after (June 2018-) the gap are available. Thus, we can obtain directly the long-term trend signals for the missing interval from existing data. Then we predict in the gap filling task the detrended TWSA signals instead, which are generally less challenging relative to predicting the original TWSA signals [Humphrey-2019-REC, LietalWRR2020]. Mathematically, the GRACE TWSAs are decomposed via linear detrending into two components:


where and are the detrended data and a linear trend term, respectively. Correspondingly, the driving data described in section 2.2 are also detrended. In the prediction task, the BCNN model learns to predict the signals. Finally, the predictions for the original TWSAs are obtained by adding the GRACE trend:


3 Methods

3.1 BCNN Deep Learning Models

We propose a BCNN method to learn the underlying relationship between and its predictors. Without loss of generality, here we use and to denote the network inputs (i.e. predictors) and outputs (i.e. ), respectively. In BCNN, the inputs and outputs are both organized as images (matrices) and the learning task becomes an image regression problem:


where is a BCNN model, with denoting all trainable network parameters, including the weights and biases. The inputs and outputs become and images, respectively, all with pixels (grids).

The network predictions are inevitably associated with epistemic uncertainties induced by lack of training data. Failing to estimate the predictive uncertainty may lead to overconfident results. This is especially the case for the GRACE TWSA prediction task considered here, as the available training data are limited. Contrary to vanilla CNNs which treat the network parameters as deterministic unknowns and thus fails to offer predictive uncertainties, in BCNN

are treated as random variables. Given a set of training data

, the network training is to infer the posterior distribution of , . Consequently, one can obtain the prediction distribution of the target variables : , , and in particular the mean

and standard deviation


In BCNN, the Bayesian training strategy proposed in ZHU2018415 is employed. Mathematically, the BCNN model is expressed as follows:



is an additive Gaussian noise term modeling the aleatoric uncertainty. A variational Bayesian inference algorithm called stein variational gradient descent (SVGD) 

[LiuSVGD2016], which is similar to standard gradient descent while maintaining the particle methods’ high efficiency, is utilized to estimate the posterior distribution . In implementation, we use samples of to approximate the posterior distribution . The samples are respectively optimized using the Adam optimizer [KingmaB14] whose gradient derives from SVGD. The predictive mean and uncertainty of BCNN for an arbitrary input are then calculated based on the predictions (). For more details, one can refer to LiuSVGD2016 and  ZHU2018415.

3.2 BCNN Architecture Design and Training

The BCNN network architecture is depicted in Figure 6. The convolutional block attention module (CBAM) proposed in Woo_2018_ECCV is used as the basic block. Given images with a resolution of as inputs to the network, they are passed through an alternating cascade of convolutional/transposed convolutional layers and CBAMs, each of which produces feature maps, to extract multi-scale features to finally predict images for the targets. The CBAM block contains two attention modules, namely the channel and spatial attention modules (Figures 6 and 7). More specifically, the channel module outputs weights between 0 and 1 assigning to the feature maps to tell the network ‘what’ (i.e. which maps) to attend; the spatial module outputs a weight matrix assigning to the pixel feature maps to tell the network ‘where’ (i.e. which regions) to emphasize or suppress. As such, the network is able to automatically focus on important features and suppress unnecessary ones [Woo_2018_ECCV].

The residual connection 

[He_2016_CVPR] and dense connection [Huang_2017_CVPR] strategies have been shown to be effective in enhancing information flow through the network and thus substantially improving the performance. Therefore, they are implemented in our BCNN model. In particular, in the residual connection structure, the feature maps with the same shape () but at different layers are connected by applying element-wise addition [He_2016_CVPR]. In the dense connection structure, the feature maps with the same size () but at different layers are cascaded together and subsequently fed as inputs into the next layer [Huang_2017_CVPR] (Figure 6). The Mish function [Misra2019Mish]

is employed in BCNN as the activation function of hidden layers unless otherwise stated.

We use twelve years of monthly GRACE TWSA data from April 2002 to March 2014 (i.e. 144 months, 69) to train the BCNN network, and those from April 2014 to June 2017 and June 2018 to August 2020 (i.e. 66 months, 31) to test the performance. We set the number of lags for predictors to 2. That is, for month , the inputs to BCNN are , , CWSC, and TWSA with . Thus, each sample contains input images and output image (TWSA). The region spanning from S to N and W to E represented by a image is considered. For non-land pixels (grids), the pixel values are set to a constant of 0. During network training, we use samples of

in the SVGD algorithm to approximate the posterior distribution, as suggested in ZHU2018415. The network is trained for 200 epochs, with a mean squared error loss function quantifying the predictive accuracy, an initial learning rate of 0.0025 in the Adam optimizer and a batch size of 12. The training is performed on a single GPU (NVIDIA Tesla V100) and takes

80 minutes. The network performance is evaluated based on the test dataset using three commonly used metrics: correlation coefficient (), Nash-Sutcliffe efficiency coefficient (NSE), and root mean squared error (RMSE), where or NSE values closer to 1.0 or lower RMSE values indicate better performances.

4 Results and Discussion

4.1 Accuracy Assessment

To illustrate the performance of BCNN, the three metrics (i.e., , NSE, and RMSE) are also computed for the ERA5L TWSAs and the hydrological model Noah-derived TWSAs (, version 2.1) [Noah].

Figure 1: Spatial maps of (row 1), NSE (row 2), and RMSE (row 3) values between the GRACE TWSAs and Noah- (column 1), ERA5L- (column 2), and BCNN-derived (column 3) TWSAs during the test period (April 2014-June 2017, June 2018-August 2020). (j-m) The density scatter plots between the GRACE and modeled TWSAs.

Figure 1 shows the spatial distribution of the accuracy metrics obtained by Noah, ERA5L, and BCNN. While Noah and ERA5L TWSAs both show relatively good correlations with GRACE TWSAs in most regions except Greenland and the extremely arid areas like Sahara, Gobi, and Arabian (Figures 1a and 1b), BCNN TWSAs exhibit clearly better correlations with GRACE TWSAs in almost all regions (Figure 1c). For the NSE metric, which measures directly the matching quality between the predicted and observed values, both Noah and ERA5L obtain relatively low values (0) in most regions except in some humid regions like Amazon and Southeastern United States (Figures 1d and 1e). In contrast, BCNN achieves relatively high values (0.5) in most regions (Figure 1f). Although BCNN provides a higher accuracy than Noah and ERA5L in the desert regions, the NSE value is still low relative to other regions. This is due to the fact that the variability of GRACE TWSAs in these regions is dominated by noise [Humphrey2016]. The improved performance of BCNN over Noah and ERA5L can be also illustrated by the RMSE maps (Figure 1(g-i)) and the density scatter plots between the modeled and GRACE TWSAs (Figure 1(j-m)), where our BCNN reduces significantly the RMSE errors and its predicted TWSAs display as expected a good consistency with GRACE TWSAs with an overall NSE value of 0.990, much higher than those of Noah (-0.028) and ERA5L (0.005).

The poor performance of Noah/ERA5L in predicting TWSAs is mainly due to their underestimation of the long-term trends of GRACE TWSAs [ScanlonE1080]. Considering the GRACE trends are employed in BCNN (section 2.3), to illustrate that the outperformance of BCNN is not only obtained by simply utilizing the GRACE trend information, but more importantly attributed to its ability to discover informative features for TWSA predictions from multi-source data, we correct the Noah and ERA5L TWSAs with the GRACE trends. That is, TWSA=detrend(TWSA)+trend, where detrend() denotes the linear detrending operation, TWSA represents the Noah or ERA5L TWSAs. Figure 8 shows the same metrics as in Figure 1 for the corrected Noah and ERA5L TWSAs. The comparison between Figures 1(c,f,i,m) and 8 again clearly indicates BCNN’s higher performance, although the consistency between the corrected Noah/ERA5L TWSAs and GRACE TWSAs has been improved significantly as expected.

Figure 2: The GRACE (a-c) and BCNN (d-f) TWSAs in three test months (June 2014, June 2017, and June 2020). (g-i) The BCNN predicted error (i.e. TWSA - TWSA) and (j-m) standard deviation (Std). (n-y): The region/basin-averaged GRACE, Noah, ERA5L and BCNN TWSA time series in different basins/regions. The red shaded area denotes the confidence interval (CI) of BCNN predictions.

Figure 2 depicts BCNN’s TWSA predictions for three test months in June 2014, June 2017, and June 2020. The three months cover the early-, mid-, and late-term of the test period, with June 2017 being the last month before the missing gap. For comparison, we also show the reference GRACE TWSAs. It can be seen that BCNN successfully captures the spatial patterns of GRACE TWSAs and provides close predictions in the three months. The predicted errors in regions with high-TWSA signals (e.g., Amazon, Central Africa, South Asia, and Greenland) are generally larger compared to other regions with relatively low-TWSA signals (Figure 2(g-i)). This is consistent with the spatial distributions of BCNN’s predictive uncertainties shown in Figure 2(j-m), where the regions with high-TWSA signals generally have larger predictive uncertainties. Note the predicted accuracy in these high-TWSA regions is still relatively good as the high-TWSA signals result in a high percentage accuracy. This can also be revealed by the NSE map shown in Figure 1f.

The performance of BCNN in providing reliable predictions for TWSAs is further demonstrated in Figures 2(n-y) and 10, which depict the basin/region-averaged TWSA time series derived from GRACE, Noah, ERA5L, and BCNN in different basins/regions (Locations of these basins/regions are shown in Figure 9). For completeness, we show the time series in both training and test periods. Again, the BCNN TWSAs fit obviously better with the reference GRACE TWSAs than Noah and ERA5L TWSAs. While BCNN may slightly underestimate/overestimate some peak/valley values of GRACE TWSAs during the test period, the GRACE TWSA curves are almost completely enveloped within the prediction interval.

4.2 Detection of Extreme Dry and Wet Events during the Gap

The GRACE TWSA time series are effective indicators for detection of extreme dry/wet events, which cause unusual decreases/increases in the TWSA signals, and quantifying the water loss/gain during the events [Fengrs10050674, Humphrey2016, LietalWRR2020]. For example, the GRACE successfully identified the 2016/2017 and 2018/2019 droughts in Central Europe (Figure 2o) [Boergens_etal_2020], the droughts from 2012 to 2016 in Central Valley (Figure 2p) [Xiao_etal_2017], and the flood in Summer 2020 in Yangtze River Basin (Figure 2w) [WEI2020100038]. As shown in Figure 2(n-y), the BCNN TWSAs agree well with GRACE TWSAs during these extreme events, suggesting that BCNN is capable of detecting the drought- and flood-induced abnormal TWSA signals.

Figure 3: (a-k) The standardized signals of the detrended and deseasonalized BCNN TWSAs (sTWSA) in the missing gap (July 2017-May 2018). The four regions within the rectangles (labeled l-o in h) have significant abnormal signals during this period. (l-o) The region-averaged time series of sTWSA, sTWSA, standardized precipitation anomalies (s), and 6-month SPEI in the four regions. The shaded area denotes the gap period.

We further analyze BCNN’s performance in detecting dry and wet events during the gap. To this end, the trend and seasonal signals are removed from the original BCNN TWSAs, which is done by fitting a linear trend via unweighted least squares, plus annual and semiannual sinusoids to the TWSA time series [CHEN2014130, Zhong_2018]. The detrended and deseasonalized TWSAs at each grid point is then standardized using the -score formula. The resulting TWSA is denoted as sTWSA and its spatial maps in the eleven missing months are shown in Figure 3(a-k). It is observed that there are four regions (labeled with (l-o) in Figure 3h) exhibiting abnormal signals, with extremely negative and positive signals indicating dry (regions l and m) and wet events (regions n and o), respectively, where the dry/wet events in regions m and n were also reported by European State of the Climate ( To examine this, we plot in Figure 3(l-o) the region-averaged times series of standardized precipitation anomalies (s) and 6-month standardized precipitation-evapotranspiration index (SPEI). When calculating , we first apply moving average on the precipitation time series () to smooth out short-term fluctuations, and then compute the anomalies for each month with respect to its respective average during 2002 and 2018. The SPEI dataset, which covers a period from 2002 to 2018, is downloaded from For comparison, the sTWSA time series are also displayed in the plot. It is observed that the two sTWSA lines agree well with the s and SPEI lines over the period 2002-2018. This demonstrates the capability of sTWSA in detecting extreme climate events. More notably, sTWSA successfully identifies the extreme dry events in regions l and m (Figures 3l and 3m) and the extreme wet conditions in regions n and o (Figures 3n and 3o) during the 11-month gap (July 2017-May 2018).

The results presented in section 4.1 and this section indicate that BCNN can successfully captures the complex spatio-temporal behaviors of GRACE TWSAs as well as the abnormal signals caused by extreme climate conditions. It is worth noting that the test data from GRACE (April 2014-June 2017) and GRACE-FO (June 2018-Aug 2020) cover the periods before and after the missing gap (i.e. July 2017-May 2018). The good agreement between BCNN and GRACE TWSAs suggests the reliability of BCNN in bridging the gap between GRACE and GRACE-FO at a global scale.

4.3 Comparison with Previous Studies

As presented in section 1, there have been many efforts undertaken to model GRACE TWSAs using data-driven methods. Here we restrict the comparison with Humphrey-2019-REC, LietalWRR2020, and SunZL2020WRR who provided publicly accessible global-scale TWSA prediction products. Note that the predicted TWSA product released by Humphrey-2019-REC is known as GRACE-REC. For a fair comparison, the GRACE TWSA data and the training periods used for BCNN network training are respectively the same as those employed in the three studies. The detailed descriptions of the three TWSA products are summarized in Table 1.

The comparison results are shown in Figures 11-13, which illustrate the , NSE, and (normalized) RMSE maps and the density scatter plots between the predicted and GRACE TWSAs. Note that the original GRACE-REC dataset provides the detrended and deseasonalized TWSAs. The trend and seasonal signals obtained from the GRACE TWSAs and Humphrey2017GRL, respectively, have been added to the original GRACE-REC TWSAs for consistency. It can be observed that BCNN obtains better metric values in the vast majority of regions (especially in the arid and semiarid regions) and shows higher agreements with GRACE in comparison to the three previous studies. Recall that the GRACE trend information has been added to the original GRACE-REC TWSAs and was also utilized in LietalWRR2020 when predicting TWSAs. The results suggest the superior capability of BCNN in feature mining and thus providing reliable TWSA predictions to bridge the gap between GRACE and GRACE-FO. Two additional noteworthy advantages of BCNN over existing methods are the relatively few assumptions and pre-processing involved and its ability to handle simultaneously the global scale.

5 Conclusions

The GRACE/GRACE-FO TWSA observations, together with the hydrological models, have been vital tools for water-related studies at regional or even global scales. However, the approximately one-year gap of TWSA observations between the two GRACE missions may introduce significant biases and uncertainties in models and consequently lead to misleading predictions. In this study, we propose a deep learning-based BCNN model driven by climatic data to fill this gap. By leveraging and implementing recent advances of deep learning (e.g., the residual and dense connections, attention mechanisms, and Bayesian training strategy), the BCNN model is able to effectively and efficiently extract important information for TWSA predictions from multi-source input data. Results show that BCNN can successfully capture the complex spatio-temporal behaviors of TWSAs and identify the extreme dry/wet events during the gap. The comparisons with hydrological model outputs and previous studies further suggest that BCNN obtains a state-of-the-art performance in bridging the gap at a global scale.

The outperformance of BCNN is mainly attributed to the use of TWSA trends, which are derived from the available GRACE/GRACE-FO data before and after the gap, and its outstanding performance in feature extraction. The long-term TWSA trends are induced by anthropogenic and/or natural factors and usually challenging-to-learn 

[Humphrey-2019-REC, LietalWRR2020, SunZL2020WRR]. The utilization of this trend information makes full use of the existing data and essentially eases the learning task for BCNN. The BCNN’s robust capability in extracting key features for TWSA predictions is especially illustrated by the comparison with the TWSA prediction products in which the trend information was also employed. Note that we are concerned with bridging the gap between GRACE and GRACE-FO in the current work. For the TWSA reconstruction task, which aims to reconstruct the TWSAs for pre-2002 and is beyond the scope of this study, the trend information is unavailable. The performance of BCNN for such a task remains to be investigated.

The JPL GRACE Mascon data used in this study are available at; ERA5-Land data are available at; Noah TWSA dataset is downloaded from This work was funded by the National Key Research and Development Program of China (2018YFC1800600), National Natural Science Foundation of China (41730856, 41874095, 41977157, 42002248, 42004073), China Postdoctoral Science Foundation (2020M681550), Jiangsu Planned Projects for Postdoctoral Research Funds (2020Z133), and Fundamental Research Funds for the Central Universities (020614380106). The authors acknowledge Dr. Yinhao Zhu from Qualcomm AI Research for his valuable suggestions on implementing the Bayesian training strategy. The BCNN codes and the predicted TWSA dataset generated in this work will be made available at upon publication of this manuscript.


Appendix A Supporting Information

Authors Study area GRACE data Training period Test period
Humphrey-2019-REC Global JPL mascon RL06 - Apr. 2014- June 2017; June 2018- July 2019
LietalWRR2020 Global CSR mascon RL06 Apr. 2002- June 2017 June 2018- Dec. 2018
SunZL2020WRR 26 basins JPL mascon RL06 Apr. 2002- Jan. 2014 Feb. 2014- June 2017
Table 1: Summaries of Experimental Settings in Previous Studies.
Figure 4: A global map of the aridity index, with -1 and 1 indicating the most arid and humid conditions, respectively. The data is available at
Figure 5: Spatial distribution of the absolute correlation coefficient () between the cumulative water storage changes (CWSCs) and GRACE TWSAs (April 2002-June 2017, June 2018-August 2020), where denotes -month lag.
Figure 6: Illustration of the Bayesian convolutional neural network (BCNN) architecture. BCNN takes images with a size of as inputs and generates images with the same size. It is an alternating cascade of convolutional (Conv)/transposed convolutional (ConvT) layers and convolutional block attention modules (CBAM), each of which outputs feature maps. The size of feature maps is sequentially halved in each Conv layer from to so as to extract multi-scale features, and then sequentially recovered to using the ConvT layers. The symbols $⃝+$, c⃝, and $⃝\times$ denote the addition (i.e. residual connection), concatenation (i.e. dense connection), and multiplication (i.e. attention connection) operations, respectively. The Mish activation (Act) function [Misra2019Mish]

is used in the network. BN=batch normalization.

Figure 7: Diagrams of the channel and spatial attention modules. The inputs to each module are feature maps with a size of

. The channel attention module utilizes both max-pooling (MaxPool) outputs (

) and average-pooling (AvgPool) outputs (

) with a shared multi-layer perceptron (MLP) to produce a channel attention map. The spatial attention module utilizes similar two outputs (each with a shape of

) to produce a spatial attention map. The sigmoid activation is used to guarantee the values in the attention maps are between 0 and 1. Conv=Convolution; Act=Activation.
Figure 8: Spatial maps of (row 1), NSE (row 2), and RMSE (row 3) between the GRACE TWSAs and Noah- (left) and ERA5L-derived (right) TWSAs during the test period (April 2014-June 2017, June 2018-August 2020). Row 4: The density scatter plots between the GRACE and modeled TWSAs. The Noah and ERA5L TWSAs have been corrected with GRACE TWSAs’ trend (trend). That is, TWSA=detrend(TWSA)+trend, where detrend() denotes the linear detrending operation, and TWSA represents the Noah or ERA5L TWSAs.
Figure 9: Locations of the regions and river basins considered for result analysis.
Figure 10: The basin-averaged GRACE, Noah, ERA5L, and BCNN TWSA time serials in different river basins. Shaded areas represent the 95% condence interval (CI) of BCNN predictions.
Figure 11: The predictive (row 1), NSE (row 2), and RMSE (row 2) accuracy of GRACE-REC (left) and our BCNN (right) for the GRACE TWSAs (test period: April 2014-June 2017, June 2018-July 2019). Row 4: The density scatter plots between the GRACE and modeled TWSAs. The trending and seasonal signals extracted from GRACE TWSAs and Humphrey2017GRL, respectively, have been added to the original GRACE-REC TWSAs for consistency.
Figure 12: The predictive (row 1), NSE (row 2), and RMSE (row 3) accuracy for the GRACE TWSAs obtained by LietalWRR2020 (left) and our BCNN (right). Row 4: The density scatter plots between the GRACE and modeled TWSAs. The test period is June 2018-December 2018. LietalWRR2020 provided the predicted TWSAs for 26 major river basins over the world.
Figure 13: The predictive NSE (column 1), CC (i.e.

; column 2), and normalized RMSE (NRMSE; column 3) accuracy for the GRACE TWSAs obtained by SunZL2020WRR (rows 1-3 with deep neural network (DNN), multiple linear regression (MLR), and seasonal autoregressive integrated moving average with exogenous variables (SARIMAX) methods, respectively) and our BCNN (row 4). The test period is February 2014-June 2017. (a-i) were extracted from Figure S7 of  SunZL2020WRR.