Prolongation of SMAP to Spatio-temporally Seamless Coverage of Continental US Using a Deep Learning Neural Network

by   Kuai Fang, et al.
Penn State University

The Soil Moisture Active Passive (SMAP) mission has delivered valuable sensing of surface soil moisture since 2015. However, it has a short time span and irregular revisit schedule. Utilizing a state-of-the-art time-series deep learning neural network, Long Short-Term Memory (LSTM), we created a system that predicts SMAP level-3 soil moisture data with atmospheric forcing, model-simulated moisture, and static physiographic attributes as inputs. The system removes most of the bias with model simulations and improves predicted moisture climatology, achieving small test root-mean-squared error (<0.035) and high correlation coefficient >0.87 for over 75% of Continental United States, including the forested Southeast. As the first application of LSTM in hydrology, we show the proposed network avoids overfitting and is robust for both temporal and spatial extrapolation tests. LSTM generalizes well across regions with distinct climates and physiography. With high fidelity to SMAP, LSTM shows great potential for hindcasting, data assimilation, and weather forecasting.



There are no comments yet.


page 11

page 18


Spatio-temporal Stacked LSTM for Temperature Prediction in Weather Forecasting

Long Short-Term Memory (LSTM) is a well-known method used widely on sequ...

Ensemble long short-term memory (EnLSTM) network

In this study, we propose an ensemble long short-term memory (EnLSTM) ne...

Stream-Flow Forecasting of Small Rivers Based on LSTM

Stream-flow forecasting for small rivers has always been of great import...

Predicting Future Sales of Retail Products using Machine Learning

Techniques for making future predictions based upon the present and past...

Enhancing streamflow forecast and extracting insights using long-short term memory networks with data integration at continental scales

Recent observations with varied schedules and types (moving average, sna...

Distanced LSTM: Time-Distanced Gates in Long Short-Term Memory Models for Lung Cancer Detection

The field of lung nodule detection and cancer prediction has been rapidl...

Interpreting LSTM Prediction on Solar Flare Eruption with Time-series Clustering

We conduct a post hoc analysis of solar flare predictions made by a Long...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Soil moisture is a key variable that controls various hydrologic processes, including infiltration, evapotranspiration and subsurface flow. It is of central importance to drought monitoring (Narasimhan and Srinivasan, 2005), floods prediction (Norbiato et al., 2008), weather forecasting (Koster, 2004), irrigation planning and many other scientifically- and socially-important applications. Launched in 2015, NASA’s Soil Moisture Active Passive (SMAP) satellite mission (Entekhabi et al., 2010)

is designed to measure top 5 cm soil moisture globally with a standard deviation of

cm/cm volumetric ratio when vegetation water content (VWC) kg/m (O’Neill et al., 2012). It achieved this goal in most core evaluation sites (Colliander et al., 2017; Jackson et al., 2016). Notwithstanding its great value, SMAP passive radiometer-based observations only have a short time span (since April 2015) with an irregular revisit time of 2-3 days, which makes it difficult to observe soil moisture responses immediately after storms or snowmelt.

Compared to SMAP’s limited resolution and time span, land surface models (LSMs), e.g., VIC (Nijssen et al., 2001), Noah (Ek et al., 2003), CLSM (Koster et al., 2000) and MOS (Koster and Suarez, 1994), simulate soil moisture seamlessly over much longer time spans. Despite their frequent use, these models may differ significantly from observations (Leeper et al., 2017; Yuan and Quiring, 2017; Dirmeyer et al., 2016; Xia et al., 2015). Biases (mean difference from the observed) are notable in all models evaluated in Xia et al. (2015)

. Their error patterns generally vary by region, model, season, and soil depths, yet there are systematic patterns in them. For example, moisture tends to be over-estimated in the arid western CONUS and under-estimated in wetter eastern US

(Yuan and Quiring, 2017); the Noah model tends to under-estimate moisture in wet seasons and over-estimate in dry seasons (Xia et al., 2015). These systematic error patterns could be exploited to improve predictions.

To correct systematic model errors, we turn to deep learning (DL), a rebranding of artificial neural network. DL has made revolutionary strides in recent years and helped to solve problems that have resisted artificial intelligence for decades

(LeCun et al., 2015)

. With earlier-generation machine learning methods, human experts extract features from data that are strongly correlated with dependent variables. DL, on the other hand, automatically extracts abstract features through their hidden layers. Two highly successful network structures are convolutional neural networks (CNN) for image-domain tasks

(Krizhevsky et al., 2012), and Long Short-Term Memory (LSTM) (Hochreiter et al., 1997; Greff et al., 2015) for time-domain tasks, although the separation is not absolute. No study, to the best of the authors’ knowledge, has employed time series deep learning in hydrology. Given DL’s success in other scientific disciplines (Voosen, 2017), it is plausible that DL can capture model error patterns that humans have yet come to explicitly formulate.

The parameter space of deep networks is substantially large in order to provide the flexibility in mapping diverse, complex functions. Thus one might be concerned about overfitting, which means coefficients are fitted to noise rather than meaningful information. However, there are recent breakthroughs in regularization techniques, e.g., Dropout (Srivastava et al., 2014), which penalize overfitting and reduce mutually dependent coefficients. Nevertheless, since LSTM has not been applied to hydrology, it is important to examine its robustness compared to conventional statistical methods.

The central hypothesis of this work is that with two years of SMAP data, LSTM can learn patterns in soil moisture dynamics and LSMs errors, and by utilizing them, can SMAP data over long time spans. Our objectives are: (1) to produce a seamless top-surface soil moisture dataset for continental United States (CONUS) with high fidelity to SMAP data; (2) to provide an initial investigation of LSTM’s capability in correcting process-based model errors; (3) to compare LSTM’s generalization capability to conventional methods in spatial and temporal extrapolation tests. Here, by ”high fidelity”, we mean a high consistency with the target data resulting in its faithful reproduction. SMAP’s retrieval algorithm for the passive product derives soil moisture from brightness temperature readings using radiative transfer and soil dielectric models (O’Neill et al., 2012), thus it also incurs biases (Colliander et al., 2017). Nevertheless, a high-fidelity hindcast product has a wide range of applications, e.g., data mining of past fire hazards, calibration of hydrologic models, or benchmarking satellite product with historical in-situ data.

2 Methods and Datasets

As an overview, we trained an LSTM network to predict SMAP L3 product with, as inputs, atmospheric forcing time series, LSM-simulated surface soil moisture and static physiographic attributes. We compared LSTM to regularized multiple linear regression, auto-regressive models, and a simple one-layer feedforward neural network. Their performances were tested by (i) temporal generalization test: training over one year and testing over another; (ii) regular spatial generalization test: training over a uniformly down-selected subset of SMAP pixels and testing over other cells; and (iii) regional holdout test: training on some sub-regions of CONUS and test on the rest. All data sources are aggregated to a daily time scale and interpolated to SMAP L3 grid. Each SMAP pixel is an


2.1 Data sources and inputs

For the learning target, we focus on the L3 passive radiometer product (L3_SM_P) which combines swaths available in each day. The spatial resolution of L3_SM_P is  36 km. For inputs, we obtained atmospheric forcing data including precipitation, temperature, radiation, humidity and wind speed from North-American Land Data Assimilation System phase II (NLDAS-2) (Xia et al., 2015). NLDAS-2 also provides, from 1979 to present, simulations of land surface states and fluxes by several LSMs. We chose Noah’s (and also compared with MOS’s) outputs (Ek et al., 2003) because it ranks in the middle among models (Xia et al., 2015) and is not as extensively calibrated as some other models, e.g., SAC. Our work does not require the best LSM, as we can observe how LSTM and other methods correct LSM errors. Noah has 4 soil layers which are of depths 0-10, 10-40, 40-100 and 100-200 cm, respectively. To match with the 0-5 cm sensed by SMAP, we tested: (i) directly using 0-10 cm data; (ii) polynomial interpolation; and (iii) integral interpolation where we find polynomials whose integrals agree with Noah-simulated values.

Static physiographic attributes (Table S1 in SI) include sand, silt and clay percentages, bulk density and soil capacity from ISRIC-WISE (Batjes, 1995). County-level annual-average irrigation data for 2010 (USGS, 2016) was overlaid with landuse data to assign irrigation in each county to agricultural land uses. The values are then aggregated to SMAP grid. Also among inputs are SMAP product flags that indicate mountainous terrain, land cover classes, VWC, urban area, water body fraction and data quality (time-averaged). SMAP product flags indicate lower data quality in dense vegetated or forest area. However, instead of removing all regions labeled as low-quality, we hypothesize that including the flags as inputs allows LSTM to implicitly assign less focus to high-uncertainty regions.

2.2 LSTM setup

As a type of Recurrent Neural Network (RNN), LSTM makes use of sequential information by updating states based on both inputs of the current time step (

) and network states of previous time steps, as illustrated in Figure S1 in Supporting information (SI). Following the notation in Lipton et al. (2015), we can write an LSTM as LSTM : . The update formula are:

(input node) (1)
(input gate) (2)
(forget gate) (3)
(output gate) (4)
(cell state) (5)
(hidden gate) (6)
(output layer) (7)


is the sigmoidal function,

is element-wise multiplication,

is the input vector (forcings and static attributes) for the time step

, ’s are the network weights, ’s are bias parameters, is the output to be compared to observations, is the hidden states, and is called the cell states of memory cells, which is unique to LSTM. Readers are referred to the literature for the detailed functionality of these units. Summarized briefly, , , control, respectively, when the input is significant enough to use, how long the past states should be remembered for, and how much the value in memory is used to compute the output. During training, ’s and ’s are adjusted using back-propagation through time (BPTT). In BPTT, the network is first unrolled over a prescribed length before the difference between the output and target propagates into the network. We used the LSTM implemented in Recurrent Neural Network library for Torch7 (Léonard et al., 2015), which is a scientific computing framework for the programming language Lua. We employed Dropout regularization, which randomly sets a fraction (dropout rate, ) of its operand to

. Dropout prevents the co-adaptation of neurons and thus reduces overfitting. We used dropout regularization to non-recurrent links as in

Zaremba et al. (2015), a constant dropout mask to recurrent connections as in Gal and Ghahramani (2015). We also implemented dropout for the memory cell as described in Semeniuta et al. (2016).

At each time step, the network outputs one scalar value (

), which is compared to SMAP L3 passive product. The loss function to be minimized is the mean-squared error calculated for the time series:


where is when time step has SMAP observation and is 0 otherwise, is the length of the time series, and is SMAP observation. For computational efficiency and stability reasons, the training is done through ”mini-batches”: for each batch, a number of instances, or SMAP pixels, are randomly collected from the training set. The loss function is then averaged over all the instances in a batch.

2.3 Tests, Conventional Algorithms, and Evaluation Metrics

In our temporal generalization test, the training set is SMAP data from April 2015 to March 2016. For computational efficiency, we picked 1 pixel from every 4 x 4 patch, resulting in a 1/16 coverage of CONUS. The test set is SMAP data for the same pixels, but for the period from April 2016 to March 2017. In the regular spatial generalization test, the training set is the same as described above, but the test set is the neighboring cells for the same period. In the regional holdout test, we trained models over 4 of the 18 2-digit Hydrologic Cataloging Units (HUC2s) and tested on others. This test challenges the ability of different methods to generalize across characteristically different climates and physiographic conditions. There are a large number of such 4-HUC2 combinations. As an initial investigation, we chose 4 of such combinations (C1-C4). Two combinations have a broad coverage of the range of Noah’s bias over CONUS, while the other two cover only part of that range. These tests inform us the effect of biased training sets on generalization.

LSTM predictions are compared to three conventional methods: the least absolute shrinkage and selection operator (lasso), auto-regressive model (AR), and a single-layer feedforward Neural Network (NN), given the same inputs. Lasso, shorthanded as LR here, is multiple linear regression with a regularization term that penalizes large regression coefficients (Tibshirani, 1994). NN can construct nonlinear transformations of inputs, but does not have memory, and therefore cannot retain time dependencies. LR and NN are operated in (1) the CONUS-scale mode, where a single model is trained for the entire training set; and (2) point-by-point mode, with subscript ””, where a separate model is trained for each pixel. AR models are trained only point-by-point (AR). More details are provided in SI Text S1. Three statistical metrics, bias (the time-averaged difference), root-mean-squared error (RMSE) and Pearson’s correlation () are calculated between between predicted and SMAP-observed soil moisture on training and test sets separately. measures the agreement between simulated and observed climatology.

While short-term forecast employs observations to continuously update solution, long-term hindcast has no observations to use. As a ”proof-of-concept” test of LSTM’s appropriateness for long-term hindcast, we trained LSTM and AR using 2 years of Noah-simulated soil moisture as the target, to hindcast to 10 years back.

3 Results

3.1 Overall test performance

For the temporal generalization test, we note substantial improvement with respect to both bias and compared to Noah (Figure 1). We report results from directly using the 0-10 cm Noah layer, although other choices are similar, as will be discussed later. The Noah solutions have a significant, spatially-varying bias in many parts of CONUS, as shown in Figure S3a in SI, especially in southeastern coastal plains (annotated in Figure S2 in SI). The LSTM correction reduces the bias by an order of magnitude, and mostly removed the spatial pattern of bias (Figure 1a). We note there is a CONUS-scale spatial trend of larger reduction of absolute bias in the Eastern CONUS, except the southeast coast (Figure 1b). The gradient appears to be related to the CONUS annual precipitation map, as it corrects Noah’s bias to over-estimate in the arid west while over-estimate in the humid east (Figure S3a, also noted in (Xia et al., 2015)).

LSTM does not only reduce bias but more noticeably improve the climatology, according to (Figure 1c,d), which only concerns the comparison in temporal fluctuations. LSTM is mostly above and 50% pixels are over 0.9, significantly above Noah. In most CONUS the improvement is greater than , while it can be in the Eastern half CONUS, especially the agricultural regions in central lowland and on the Appalachian Plateau (Figure 1d). We note this map is no longer similar to the bias map of Noah, suggesting mechanisms that correct seasonality are different from those correcting bias. We hypothesize LSTM significantly improves soil moisture dynamics in agricultural regions, e.g., irrigation, and the influence of shallow soils on the Appalachian highlands (Fang and Shen, 2017). On the other hand, over the majority of CONUS, the RMSE of LSTM is lower than 0.035 (Figure 1e). A continental-scale west-to-east increasing trend in RMSE(LSTM) is apparent. The higher errors in the East may result from higher annual precipitation, which results in (i) higher annual-mean soil moisture, and (ii) high VWC, which reduces SMAP data quality. However, the Southeast regions facing the Atlantic has a rather low RMSE(LSTM). Figure 1f suggests the improvement of LSTM over the one-layer NN is obvious, especially in the central lowland region and coastal plains.

(a) (b) (c) (d) (e) (f)
Figure 1: Performance of LSTM predictions in the test set of the temporal generalization test. All metrics are evaluated against SMAP. (a) . Each pixel in this figure is patch of 4x4 SMAP pixels. Bias in most parts of CONUS is between -0.02 and 0.015; (b) Change of absolute bias due to LSTM correction. LSTM reduces the absolute bias significantly over CONUS; (c) LSTM anomaly correlation (); (d) change of due to LSTM correction; (e) ; (f) Since the RMSE improvement over Noah looks similar to panel b, here we show the difference in RMSE between LSTM and NN predictions. Maps of Noah’s performance is provided in Figure S3 in SI.
Figure 2: Comparisons between SMAP observations and soil moisture predicted by LSTM, Noah, and AR at 5 locations. We chose sites around 10-th, 25-th, 50-th, 75-th and 90-th percentiles as ranked by .
Figure 3: Boxplots comparing LSTM, Noah, NN, AR, LR, and in the temporal generalization test and the regular spatial generalization test. Each box and whisker element summarizes SMAP pixels over CONUS with percentiles annotated in the panel. Y-axis limits truncate Noah boxes to focus on the central part of other boxes. The left column is the temporal generalization test, and the right column is the regular spatial generalization test. The three rows are for RMSE, Bias and , respectively.
Figure 4: (a) Test biases for cases C1-C4 (distinguished by different colors), which are four combinations of 4 HUC2s. Each group consists of 5 boxes, which are, respectively, LSTM with Noah among inputs, LSTM without Noah, NN with Noah in inputs, Noah in the training set, and self-assessed test bias. (b) test coefficient of determination (). We note significantly better seasonality captured by LSTM in cases C1-C3, but not necessarily in C4; (c) Distributions of Noah’s biases in the training sets for C1-C4.

3.2 Comparison of generalization capability with other methods

In the temporal generalization test, time series prolonged by LSTM compares favorably against AR across a wide range of LSTM performance levels (Figure 2). For the 10-th to even 75-th percentile pixel, LSTM is able to closely follow SMAP, except that peaks are under-estimated in the 50-th percentile pixel in 2016. The frequent rain events in April-May 2016 in Figure 2a and their recessions are well captured. For the 75-th percentile pixel, all peaks are captured, but we notice some over-estimation near the troughs in August 2016. In contrast, while AR is also behavioral, we notice it often noticeably under-estimates the troughs, over-estimates the seasonal rising limbs and overshoots some peaks. In the 10-th percentile cell, AR performs poorly between October 2016 and early 2017.

Summarized over CONUS, LSTM shows the lowest test RMSE and bias, and the highest (Figure 3), followed by NN, LR, AR, LR, NN and lastly Noah. Neither the vertical interpolation procedure nor the choice of LSM (MOS or Noah) has much impact on LSTM’s prolongation performance (see Figures S4 and S5 in SI). The test RMSEs of LSTM are 0.022, 0.027, 0.036 and 0.057 for the , , and percentile pixel, respectively (Figure 3a). With lasso regularization, LR has similar training and test RMSEs, but its 25-th percentile test RMSE is similar to the -th percentile of LSTM’s. Therefore, the more complex relationships permitted by LSTM are beneficial. The LR improves from LR as it specializes in each pixel, and the lasso regularization appears to have prevented overfitting, but its error is still larger than the CONUS-scale LSTM. NN and AR appear more overfitted than LR. LSTM’s test bias is only moderately smaller than that of NN, AR and LR, but is much higher. 75% and 50% of (LSTM) are greater than 0.80 and 0.87, respectively.

Note AR has sub-par performance in both training and test periods. The test RMSE box for AR is wider, suggesting its formulation works well for some pixels but not so well in others. Furthermore, the extended proof-of-concept long-term hindcast experiment shows a similar contrast. LSTM has robust prolongation performance at a 10-year hindcast scale while AR generates larger errors. Errors for both methods are independent of hindcast lengths, i.e., 10-year-prior hindcast error is not much different from 2-year hindcast (Text S2 and Figure S6 in SI). Meanwhile, in the regular spatial generalization test, LSTM again exhibits the smallest RMSE and bias (Figure 3c-d). The contrast in bias is smaller than the temporal test, but the comparisons are similar.

In 4-HUC2 combinations 1 and 2 (C1 and C2), the Noah bias covers a wide range from -0.25 to 0.15 cm/cm, which appears to be the whole range of the Noah biases we see over CONUS. In both cases, LSTM is able to greatly reduce the bias and improve soil moisture climatology (much higher ) compared to both NN and Noah (Figure 4a-b). The boxes corresponding to LSTM bias are very narrow, and its centers are nearly 0. For the case C3, we note that it has few points with bias ¡-0.2, so for this HUC2 combination, the training set under-samples the Noah errors that lead to strong negative biases. As a result, LSTM’s bias is no longer near 0, although still much better than NN’s and Noah’s. On the other hand, for C4, the training set is strongly biased. It lacks any basin with a Noah bias of ¡-0.1. We note the narrow box corresponding to Noah’s bias in the training set for C4. Unsurprisingly, LSTM’s performance deteriorates: LSTM is no longer able to correct the bias, and its range of bias is large. NN, similarly, also fails to correct the bias. We obtained LSTM’s self-assessment of Noah bias by subtracting Noah’s solution from LSTM’s prediction. The self-assessed bias (Figure 4a) has a range of bias which has little overlap with the Noah bias in the training set. This may be a signal we can utilize in the future to identify biased training sample and large predictive uncertainty.

4 Discussion

In many parts of CONUS, LSTM’s RMSE is smaller than SMAP’s design measurement accuracy. It appears even with 1 year of data over CONUS, when grouped together, has enough information to train an LSTM to hindcast SMAP data. A factor that contributes to such performance is the short memory length of soil moisture, which was found to range between 5 to 40 days (Orth and Seneviratne, 2012) in previous work. As a result, two years of data, when grouped together, contain many complete wet-dry cycles. The hindcast quality should improve as SMAP data increases. On a side note, because true random noise cannot be predicted in the test set, it follows that SMAP L3’s RMSE could be below 0.027 in 50% of CONUS. Also, the official SMAP data quality flag labels the forested Southeast Coastal Plains and South Atlantic regions as ”not recommended” quality (Figure S3). Our LSTM has a RMSE of 0.02-0.035 there, which suggest SMAP may be functional in these regions, but the impacts of the retrieval algorithm should be carefully examined.

It seems non-recurrent NN can already remove a large part of bias by capturing how environmental factors lead to certain type of biases. However, NN cannot maintain time dependencies, which may explain its performance difference from LSTM. Therefore, we argue an advantage of LSTM originates from its recurrent nature. Meanwhile, alternative recurrent models, e.g., AR and moisture loss functions (Koster et al., 2017), are profoundly useful due to their interpretability, parsimony and great value in ”nowcasting” or short-term forecasting (see Koster et al. (2017) for a solid application), but they require continuous updates by observations to avoid drift from true dynamics. At one-year scale, injected data already has little effects on hindcast solutions. For longer-term hindcast, pattern-based methods like LSTM appear to be more suitable.

Previous soil moisture comparisons mainly focused on anomalies, but the prevalent bias with Noah’s surface moisture simulations can cause large errors in downstream users such as weather modeling (Massey et al., 2016). The continental-scale bias pattern suggests some systematic errors with Noah’s model structure/parameters. Some hypotheses include (i) Noah’s soil pedo-transfer functions are fundamentally inadequate in resolving regionally heterogeneous soil responses to rainfall, which could explain the need for calibration in most large-scale flood forecasting systems; or (ii) groundwater flow, which is important in thick-soiled, high-infiltrating capacity regions like the southeast (Fang et al., 2016), is not properly simulated in LSMs (Clark et al., 2015). However, LSTM appears to be able to integrate information from raw data and compensate for the inadequate representation uniformly over CONUS.

Conventional statistical wisdom suggests that simpler models are more robust and models with high degrees of freedom may be easily overfitted. However, the present work shows the CONUS-scale deep learning networks have smaller test errors than three alternative methods trained point-by-point. In fact, we hypothesize an important strength of LSTM originates from its flexibility to simultaneously learn from a large and heterogeneous collection of data and identify commonalities and differences. Its generalization capability stems from building internal models (in the attribute space) to capture biases and temporal fluctuations. For our regional holdout test, creating these internal models does not seem to require having all combinations of climates and physical attributes in the training set, as the HUC2s have distinct climates, topography, landcover and soils.

5 Conclusion

We have trained a CONUS-scale LSTM network to predict SMAP data. This network is capable of correcting spatially-heterogeneous model bias as well as climatological errors between Noah-simulated and SMAP-observed top-surface soil moisture, creating a CONUS-scale seamless moisture product that has high fidelity to SMAP. Despite having high degrees of freedom, when properly regularized, LSTM exhibits better generalization capability, both in space and time, than linear regression, auto-regressive models, and a one-layer neural network. Its test error approaches the instrument accuracy limit with SMAP. LSTM will be helpful in long-range soil moisture hindcasting or forecasting, weather modeling, and data assimilation. Its generalization capability arises from building internal models from physical attributes and synthesis of climate forcing. It does not necessarily require similar examples in the training set. Unless the training set is strongly biased, LSTM has a good chance of success.

6 Limitations and Future Work

As a first paper using LSTM in hydrology, this work is by no means a thorough investigation. Optimization is certainly possible. Our work does not address the question about the accuracy of SMAP data, which is addressed by other studies. The hindcast performance with respect to capturing soil moisture during drought should be further examined with in-situ data. We should further assess LSTM’s performance in comparison with regionally-trained simpler models. The implications of low LSTM RMSEs in forested region warrants further investigations.

7 Acknowledgment

Data for SMAP can be downloaded from SMAP’s data repository. Work is supported by seed grants from Penn State College of Engineering and Institute for CyberScience. Shen is partially supported by Office of Biological and Environmental Research of the US Department of Energy under contract DE-SC0010620. We thanks NVIDIA Corporation for donating a Tesla K40 GPU for this research.

SMAP Soil Moisture Active Passive DL Deep learning LSTM Long Short-Term Memory LR Linear Regression ANN Artificial Neural Network


  • Batjes (1995) Batjes, N. H. (1995), A Homogenized Soil Data File for Global Environmental Research: A Subset of FAO, ISRIC, and NRCS profiles (Version 1.0), Tech. Rep. July, ISRIC.
  • Clark et al. (2015) Clark, M. P., Y. Fan, D. M. Lawrence, J. C. Adam, D. Bolster, D. J. Gochis, R. P. Hooper, M. Kumar, L. R. Leung, D. S. Mackay, R. M. Maxwell, C. Shen, S. C. Swenson, and X. Zeng (2015), Improving the representation of hydrologic processes in Earth System Models, Water Resources Research, 51(8), 5929–5956, doi:10.1002/2015WR017096.
  • Colliander et al. (2017) Colliander, A., T. Jackson, R. Bindlish, S. Chan, N. Das, S. Kim, M. Cosh, R. Dunbar, L. Dang, L. Pashaian, J. Asanuma, K. Aida, A. Berg, T. Rowlandson, D. Bosch, T. Caldwell, K. Caylor, D. Goodrich, H. al Jassar, E. Lopez-Baeza, J. Martínez-Fernández, A. González-Zamora, S. Livingston, H. McNairn, A. Pacheco, M. Moghaddam, C. Montzka, C. Notarnicola, G. Niedrist, T. Pellarin, J. Prueger, J. Pulliainen, K. Rautiainen, J. Ramos, M. Seyfried, P. Starks, Z. Su, Y. Zeng, R. van der Velde, M. Thibeault, W. Dorigo, M. Vreugdenhil, J. Walker, X. Wu, A. Monerris, P. O’Neill, D. Entekhabi, E. Njoku, and S. Yueh (2017), Validation of SMAP surface soil moisture products with core validation sites, Remote Sensing of Environment, 191, 215–231, doi:10.1016/j.rse.2017.01.021.
  • Dirmeyer et al. (2016) Dirmeyer, P. A., J. Wu, H. E. Norton, W. A. Dorigo, S. M. Quiring, T. W. Ford, J. A. Santanello, M. G. Bosilovich, M. B. Ek, R. D. Koster, G. Balsamo, D. M. Lawrence, P. A. Dirmeyer, J. Wu, H. E. Norton, W. A. Dorigo, S. M. Quiring, T. W. Ford, J. A. S. Jr., M. G. Bosilovich, M. B. Ek, R. D. Koster, G. Balsamo, and D. M. Lawrence (2016), Confronting Weather and Climate Models with Observational Data from Soil Moisture Networks over the United States, Journal of Hydrometeorology, 17(4), 1049–1067, doi:10.1175/JHM-D-15-0196.1.
  • Ek et al. (2003) Ek, M. B., K. E. Mitchell, Y. Lin, E. Rogers, P. Grunmann, V. Koren, G. Gayno, and J. D. Tarpley (2003), Implementation of Noah land surface model advances in the National Centers for Environmental Prediction operational mesoscale Eta model, Journal of Geophysical Research: Atmospheres, 108(D22), doi:10.1029/2002JD003296.
  • Entekhabi et al. (2010) Entekhabi, D., E. G. Njoku, P. E. O’Neill, K. H. Kellogg, W. T. Crow, W. N. Edelstein, J. K. Entin, S. D. Goodman, T. J. Jackson, J. Johnson, J. Kimball, J. R. Piepmeier, R. D. Koster, N. Martin, K. C. McDonald, M. Moghaddam, S. Moran, R. Reichle, J. C. Shi, M. W. Spencer, S. W. Thurman, L. Tsang, and J. Van Zyl (2010), The soil moisture active passive (SMAP) mission, Proceedings of the IEEE, 98(5), 704–716, doi:10.1109/JPROC.2010.2043918.
  • Fang and Shen (2017) Fang, K., and C. Shen (2017), Full-flow-regime storage-streamflow correlation patterns provide insights into hydrologic functioning over the continental US, Water Resources Research, doi:10.1002/2016WR020283.
  • Fang et al. (2016) Fang, K., C. Shen, J. B. Fisher, and J. Niu (2016), Improving Budyko curve-based estimates of long-term water partitioning using hydrologic signatures fromGRACE, Water Resources Research, pp. 1–20, doi:10.1002/2014WR015716.
  • Gal and Ghahramani (2015) Gal, Y., and Z. Ghahramani (2015), A Theoretically Grounded Application of Dropout in Recurrent Neural Networks,
  • Greff et al. (2015) Greff, K., R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber (2015), LSTM: A Search Space Odyssey,
  • Hochreiter et al. (1997) Hochreiter, S., S. Hochreiter, J. Schmidhuber, and J. Schmidhuber (1997), Long short-term memory., Neural computation, 9(8), 1735–80, doi:10.1162/neco.1997.9.8.1735.
  • Jackson et al. (2016) Jackson, T., P. O’Neill, E. Njoku, S. Chan, R. Bindlish, A. Colliander, F. Chen, M. Burgin, S. Dunbar, J. Piepmeier, M. Cosh, T. Caldwell, J. Walker, X. Wu, A. Berg, T. Rowlandson, A. Pacheco, H. McNairn, M. Thibeault, J. Martínez-Fernández9, Á. González-Zamora, M. Seyfried10, D. Bosch, P. Starks, D. Goodrich, J. Prueger, Z. Su, R. van der Velde, J. Asanuma, M. Palecki, E. Small, M. Zreda, J. Calvet, W. Crow, Y. Kerr, S. Yueh, and D. Entekhabi (2016), Soil Moisture Active Passive (SMAP) Project Calibration and Validation for the L2/3_SM_P Version 3 Data Products.
  • Koster (2004) Koster, R. D. (2004), Regions of Strong Coupling Between Soil Moisture and Precipitation, Science, 305(5687), 1138–1140, doi:10.1126/science.1100217.
  • Koster and Suarez (1994) Koster, R. D., and M. J. Suarez (1994), The components of a ‘SVAT’ scheme and their effects on a GCM’s hydrological cycle, Advances in Water Resources, 17(1-2), 61–78, doi:10.1016/0309-1708(94)90024-8.
  • Koster et al. (2000) Koster, R. D., M. J. Suarez, A. Ducharne, M. Stieglitz, and P. Kumar (2000), A catchment-based approach to modeling land surface processes in a general circulation model 1. Model structure, Journal of Geophysical Research D: Atmospheres, 105(D20), 24,809–24,822.
  • Koster et al. (2017) Koster, R. D., R. H. Reichle, S. P. P. Mahanama, R. D. Koster, R. H. Reichle, and S. P. P. Mahanama (2017), A Data-Driven Approach for Daily Real-Time Estimates and Forecasts of Near-Surface Soil Moisture, Journal of Hydrometeorology, doi:10.1175/JHM-D-16-0285.1.
  • Krizhevsky et al. (2012)

    Krizhevsky, A., I. Sutskever, and G. E. Hinton (2012), ImageNet Classification with Deep Convolutional Neural Networks,

    Advances In Neural Information Processing Systems, pp. 1–9, doi:
  • LeCun et al. (2015) LeCun, Y., Y. Bengio, and G. Hinton (2015), Deep learning, Nature, 521(7553), 436–444, doi:10.1038/nature14539.
  • Leeper et al. (2017) Leeper, R. D., J. E. Bell, C. Vines, M. Palecki, R. D. Leeper, J. E. Bell, C. Vines, and M. Palecki (2017), An Evaluation of the North American Regional Reanalysis Simulated Soil Moisture Conditions during the 2011–13 Drought Period, Journal of Hydrometeorology, 18(2), 515–527, doi:10.1175/JHM-D-16-0132.1.
  • Léonard et al. (2015)

    Léonard, N., S. Waghmare, Y. Wang, and J.-H. Kim (2015), rnn : Recurrent Library for Torch,
  • Lipton et al. (2015) Lipton, Z. C., J. Berkowitz, C. Elkan, Zachary C. Lipton, Z. C. Lipton, Zachary C. Lipton, J. Berkowitz, and C. Elkan (2015), A Critical Review of Recurrent Neural Networks for Sequence Learning,, pp. 1–35, doi:10.1145/2647868.2654889.
  • Massey et al. (2016) Massey, J. D., W. J. Steenburgh, J. C. Knievel, W. Y. Y. Cheng, J. D. Massey, W. J. Steenburgh, J. C. Knievel, and W. Y. Y. Cheng (2016), Regional Soil Moisture Biases and Their Influence on WRF Model Temperature Forecasts over the Intermountain West, Weather and Forecasting, 31(1), 197–216, doi:10.1175/WAF-D-15-0073.1.
  • Narasimhan and Srinivasan (2005) Narasimhan, B., and R. Srinivasan (2005), Development and evaluation of Soil Moisture Deficit Index (SMDI) and Evapotranspiration Deficit Index (ETDI) for agricultural drought monitoring, Agricultural and Forest Meteorology, 133(1-4), 69–88, doi:10.1016/j.agrformet.2005.07.012.
  • Nijssen et al. (2001) Nijssen, B., R. Schnur, and D. P. Lettenmaier (2001), Global Retrospective Estimation of Soil Moisture Using the Variable Infiltration Capacity Land Surface Model, 1980–93, Journal of Climate, 14(8), 1790–1808, doi:10.1175/1520-0442(2001)014¡1790:GREOSM¿2.0.CO;2.
  • Norbiato et al. (2008) Norbiato, D., M. Borga, S. Degli Esposti, E. Gaume, and S. Anquetin (2008), Flash flood warning based on rainfall thresholds and soil moisture conditions: An assessment for gauged and ungauged basins, Journal of Hydrology, pp. 274–290, doi:10.1016/j.jhydrol.2008.08.023.
  • O’Neill et al. (2012) O’Neill, P., S. Chan, E. Njoku, and T. J. R. Bindlish (2012), Algorithm Theoretical Basis Document (ATBD) SMAP Level 2 & 3 Soil Moisture (Passive) (L2_SM_P, L3_SM_P),
  • Orth and Seneviratne (2012) Orth, R., and S. I. Seneviratne (2012), Analysis of soil moisture memory from observations in Europe, Journal of Geophysical Research, 117(D15), D15,115, doi:10.1029/2011JD017366.
  • Semeniuta et al. (2016) Semeniuta, S., A. Severyn, and E. Barth (2016), Recurrent Dropout without Memory Loss,
  • Srivastava et al. (2014) Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014), Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research, 15, 1929–1958.
  • Tibshirani (1994) Tibshirani, R. (1994), Regression Shrinkage and Selection Via the Lasso, Journal of the Royal Statistical Society, Series B, 58, 267—-288.
  • USGS (2016) USGS (2016), Estimated Use of Water in the United States County-Level Data for 2010.
  • Voosen (2017) Voosen, P. (2017), The AI detectives, Science, 357(6346).
  • Xia et al. (2015) Xia, Y., M. B. Ek, Y. Wu, T. Ford, S. M. Quiring, Y. Xia, M. B. Ek, Y. Wu, T. Ford, and S. M. Quiring (2015), Comparison of NLDAS-2 Simulated and NASMD Observed Daily Soil Moisture. Part I: Comparison and Analysis, Journal of Hydrometeorology, 16(5), 1962–1980, doi:10.1175/JHM-D-14-0096.1.
  • Yuan and Quiring (2017) Yuan, S., and S. M. Quiring (2017), Evaluation of soil moisture in CMIP5 simulations over the contiguous United States using in situ and satellite observations, Hydrology and Earth System Sciences, 21(4), 2203–2218, doi:10.5194/hess-21-2203-2017.
  • Zaremba et al. (2015) Zaremba, W., I. Sutskever, and O. Vinyals (2015), Recurrent Neural Network Regularization,


  1. Text S1. Technical details about conventional statistical methods.

  2. Figure S1. Comparison between the structures of Long Short-Term Memory (LSTM) and simple recurrent neural network (RNN).

  3. Figure S2. Map of SMAP’s data quality flag, with annotations for geographic regions on the continental US

  4. Figure S3. Noah bias and RMSE evaluated against SMAP over CONUS.

  5. Figure S4. Performance of training using Noah soil moisture interpolated to 5 cm depth.

  6. Figure S5. Comparing LSTM models created with Noah or MOS models as inputs.

  7. Text S2. Proof-of-concept test for the potential of LSTM for long-term hindcast.

  8. Figure S6. Performance of LSTM and AR for the synthetic long-term hindcast experiment

  9. Table S1. Predictors employed by LSTM, lasso-regularized linear regression and one-layer feedforward neural network

Text S1. Technical Details about Conventional Methods

We compared the Long-Short Term Memory (LSTM) network to the least absolute shrinkage and selection operator (lasso), auto-regressive moving average model (AR), and a single-layer feedforward Neural Network (NN), given the same inputs. Because lasso is essentially a regularized linear regression, it is shorthanded as LR in our paper. The equation for estimating the parameters for LR is:


where is the SMAP soil moisture product, are coefficients for the LR model, is a regularization parameter that determines how much penalty is applied on large coefficients, and contains exogenous inputs including temperature, precipitation, wind, downward shortwave and long wave radiations, specific humidity, and Noah-simulated potential evapotranspiration, evaporation, and runoff. In alternative models that we examined, we also tested removing the list of Noah outputs. The regularization parameter () is determined experimentally to minimize the test error, and a value of 0.002 is found to be appropriate for LR, and point-by-point LR (LR).

We have added point-by-point auto-regressive model with exogenous inputs into the comparisons, meaning a separate model is trained for each SMAP pixel. We did not consider moving average models because our focus is on the potential of the method for long-term forecast, while moving-average models require observations to calculate residuals. The equation for the AR is: θ_t = c + ϵ_t + ∑_i=1^p α_iθ_t-i + ∑_k=1^r γ_kx_k,t where is a constant, is the time step, ’s are soil moisture observations, is the order of the auto-regression, and are coefficients that will be estimated for each SMAP pixel, and are forcing inputs as indicated above. For our long-term hindcast test. We could include static attributes in this equation but since they are static in time they will be absorbed by the constant , and because we are training point-by-point there is no reason to consider them. During parameter estimation (training) stage, observations are used to update the past states (). In the long-term hindcast (testing) stage, because there is no observation, are the AR-predicted values. The model has to recursively apply the forecast equation to proceed in time. We varied from 0 to 5 and identified the value that gave the smallest testing error for each site.

The one-layer feedforward neural network (NN) is simply a linear combination of inputs and a transformation: θ^NN(t)=f(W_NNx+b) where is the weights of the neural network, b is a constant coeffient and is a nonlinear transformation, in this case tan-sigmoid (). We regularized NN using early stopping and L2-norm regularization. A regularization parameter of 0.002 was found to be give the smallest test root-mean-squared error (RMSE). NN and its point-by-point version, NN, have a linear hidden layer of size 100 and 30, respectively, as larger hidden size results in more over-fitting.

Figure S1: Comparing an Long Short-Term Memory (LSTM) unit with simple recurrent neural network (RNN). The transformations from inputs to , , are sigmoidal functions. From inputs to and from to the transformation is . means multiplication by weights. Main point: the conventional design of RNN only iteratively update the hidden state. The design of gates in LSTM allows it to learn when to forget past states, and when to output, thus addressing the issue of slow training of front node with RNN. Figure is modified from [Greff et al., 2015].
Figure S2: SMAP data quality with geographic regions annotated on the map for reference. The values shown is the time-averaged SMAP ”recommended quality” flag. We notice that Applachian Highlands and Southeast Coastal Plains are both mostly flagged as having bad quality, but LSTM’s root-mean-squared-error from SMAP L3, RMSE(LSTM) are in the range of 0.02-0.035 in the South Appalachian and Coastal Plains according to Figure 1. Because random error cannot be captured in the test, it suggests SMAP quality may be not as bad as thought. However, this finding and the potential influence from models in the retrieval algorithm need to be thoroughly evaluated.
(a) (b)
Figure S3: Performance of Noah evaluated against SMAP in the testing set of the temporal generalization test. (a) Bias: the time-averaged value of Noah-predicted soil moisture, interpolated to 5 cm, and SMAP L3 product; (b) anomaly correlation coefficient between Noah-predicted soil moisture and SMAP L3 product.
Figure S4: Same as Figure 3 but the Noah-simulated values are linearly interpolated to 0-5 cm. We tested several interpolation methods: (i) directly using 0-10 cm data; (ii) linear (2-point) and cubic vertical interpolation (3-point) using top layers; and (iii) integral interpolation: we determined a -order (or ) polynomial whose integral in these layers agree with Noah-simulated values. This Figure shows method (i), whose results are very similar to those reported in Figure 3, while other interpolation methods also generate similar results
Figure S5: Comparisons between LSTM models with Noah (L+Noah) and MOS (L+MOS) solutions as inputs. Both Noah and MOS solutions are obtained from North American Land Data Assimilation System (NLDAS). The distinction between ”train” and ”test” for Noah and MOS only means the different time periods for which the metrics are calculated. Noah and MOS have comparable performance in simulating moisture climatology. It appears MOS generally has smaller root-mean-squared error (RMSE) and smaller bias. However, using which model in the inputs does not seem to have a noticeable impact on the test performance of the LSTM models.

Text S2. Proof-of-concept test for the potential of LSTM for long-term hindcast

Since SMAP has a limited time span, we conducted a proof-of-concept experiment that examines the potential of LSTM for multi-year-scale soil moisture hindcasting and compare it to point-by-point auto-regressive models (AR). These synthetic experiments are not thorough in performance optimization, as true hindcasting will involve auxiliary satellite-based observations and in-situ data. Both LSTM and AR are trained in 2015-2016 with Noah-simulated soil moisture as the target and climate forcing as the inputs. Based on the temporal generalization test described in the main text, we removed all Noah-simulated fields from the inputs, and, since AR

does not require any static attributes like topography and soil texture, we also removed such attributes from LSTM’s inputs. We added two types of synthetic noise to Noah solutions: a Gaussian white noise (with standard deviation = 0.04) and a relative error. Neither types of noise is auto-correlated. The formulae for the relative error is:

θ_s=θ_Noah*(1+ϵ) where is the synthetic observation to be treated as the learning target, is the top 10 cm soil moisture simulated by Noah, and is a Gaussian relative error term.

The results show that with two years of training data, LSTM can well learn the soil moisture dynamics of Noah (Figure S6a-b). The median error for the white-noise case only slightly increases from  0.04 in the training period, which is almost equal to the added noise) to 0.043 in 2005-2006. Importantly, the hindcast noise does not increase as a function of hindcast length, i.e., distance from the first synthetic observation. The AR also works decently, with a median error around 0.049 in 2005-2006. Its error is also not influenced by hindcast length, perhaps because soil moisture dynamics simulated by Noah has only limited memory length. However, LSTM is still noticeably stronger as 85-th percentile of LSTM’s error in 2005-2006 is less than 25-th percentile error of AR. The LSTM boxes are much narrower than those of AR. Also, note that we only created one LSTM model for the continental U.S. (CONUS). In addition, LSTM can make use of static attributes to differentiate between locations with different soil textures and land covers, but AR cannot. Therefore, the performance of LSTM may further improve as these attributes are included. LSTM may compensate for the lack of attributes by summarizing information from climate forcings, as climate features co-vary with physical attributes. Figure S6c compares the hindcast time series at a pixel. We note that AR tends to over-predict major peaks but under-predict the rise limbs. LSTM well captures the troughs but AR may over-predict the troughs.

As Noah has simpler dynamics and less unknown variables than real systems, it is easier to learn so it is not surprising the errors are close to the added Gaussian noise. The larger error of AR during the training period suggest its formulation is not flexible enough to completely reproduce the dynamics of Noah. These results shown here mainly illustrates that LSTM has a great potential for long-term hindcasting. It appears from our results that since soil moisture has short memory, hindcasting to one year is not very different from hindcasting to 10 years. However, the training data should adequately sample plausible soil moisture dynamics.

Figure S6: Proof-of-concept long-term hindcast tests with Noah-simulated soil moisture as the target. (a) boxplot comparing errors (evaluated against Noah) for the 10-year hindcast. Noah solution is contaminated by a Gaussian noise with a standard deviation () of 0.04. The RMSEs are calculated for each 2-year period and are grouped over CONUS to form the boxplot. Note that error does not increase as hindcast length increases, i.e., during 2005-2006, the errors are not greater than those in 2013-2014; (b) same as (a) but for a relative noise; (c) time series for Noah, LSTM and AR at a pixel. We only show 5 years of hindcast for clarity of the plot. The zoomed-in panel on (c) (corresponding to the brown box in the main plot) highlights how AR over-estimates the two soil moisture peaks. Meanwhile, AR seems to have dampened small-scale fluctuations
NLDAS Model Outputs
ALBDO Albedo RCSOL Soil moisture parameter in canopy conductance
ARAIN Liquid precipitation RCT Temperature parameter in canopy conductance
ASNOW Frozen precipitation RSMACR Relative soil moisture availability control factor
AVSFT Average surface skin temperature RSMIN Minimal stomatal resistance
BGRUN Subsurface runoff (baseflow) SBSNO Sublimation (evaporation from snow)
CCOND Canopy conductance SHTFL Sensible heat flux
CNWAT Plant canopy surface water SNOD Snow depth
GFLUX Ground heat flux SNOHF Snow phase-change heat flux
LAI Leaf area index SNOM Snow melt
LHTFL Latent heat flux SNOWC Snow cover
LSOIL Liquid soil moisture content (non-frozen) SOILM Soil moisture content
MSTAV Moisture availability SSRUN Surface runoff (non-infiltrating)
NLWRS Net longwave radiation flux (surface) TRANS Transpiration
NSWRS Net shortwave radiation flux (surface) TSOIL Soil temperature
PEVPR Potential latent heat flux VEG Vegetation
RCQ Humidity parameter in canopy conductance WEASD Water equivalent of accumulated snow depth
RCS Solar parameter in canopy conductance
NLDAS Forcing
ACOND Aerodynamic conductance EVP Evaporation
ACPCP Convective precipitation hourly total HGT Geopotential height
APCP Precipitation hourly total PEVAP Potential evaporation hourly total
CAPE above ground Convective Available Potential Energy PRES Surface pressure
CONVfrac Fraction of total precipitation that is convective SPFH Specific humidity
DLWRF Longwave radiation flux downwards TMP Temperature
DSWRF Shortwave radiation flux downwards UGRD Zonal wind speed
SMAP Flags
albedo Albedo vegewater Vegetation water content
coast Coastal proximity roughness Roughness
waterbody Radar water body fraction staWater Static water
landcover Landcover classes urban Urban area
mount Mountainous terrain vegetation Dense vegetation
Geographic attributes
Bulk Bulk density Irri Irrigation
Capa Soil capacity Sand Sand fraction
Clay Clay fraction Silt Silt fraction
LULC NLCD 2001 land cover and use type
Table S1: Predictors used in the training of LSTM, lasso-regularized linear regression, and one-layer feedforward neural network