Spatiotemporal variance influencing local water discharge is inevitable due to varying soil property, climate, and land usage among different regions. Training individual hydrological models to capture regional spatiotemporal features requires extensive training and computational resources. These regions, although varying in local characteristics, still follow fundamental hydrological dependencies. Transfer learning is a well-known solution to reuse a trained model to reduce training duration for a new dataset. Although most commonly applied in computer vision[28, 2, 26] and time-series prediction [21, 8, 15], recent works [7, 3] in other application domains show that these techniques have also been effective in knowledge guided neural networks that are powered by process-based (PB) mechanisms. In hydrology, PB models that rely on domain-specific principles and mathematical formulations, although often criticized as overly complex are elegant in describing large-scale patterns [14, 30] where knowledge of distributed state variables and physical constraints is essential. However, understanding a system’s general organization does not provide insights into how the principal variables interact over space and time. 
discuss all the prevalent challenges of distributed hydrological PB models. On the other hand, several works show an increase in deep learning (DL) applications in hydrology. A recent survey of 129 publications finds the use of various state-of-the-art DL network architectures, namely, convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and gated recurrent units (GRUs). However, limited knowledge of underlying PB mechanisms in DL algorithms makes PB models irreplaceable in studying environmental systems. As a result, knowledge-guided deep neural networks are an active research area in hydrology [34, 31, 10] as well as other domains [12, 1, 22].
In this paper, we present HydroDeep that couples a PB hydro-ecological model with a combination of one-dimensional CNN and LSTM, to capture regional grid-based geo-spatiotemporal features of a watershed contributing to water discharge that influence an event of flood. The combination of CNN and LSTM that our PB-DL based network uses showed promising results in computer vision 
, speech recognition and natural language processing, and other time-series analysis . In our experiments, HydroDeep outperformed the Nash–Sutcliffe efficiency  of standalone CNNs and LSTMs by 1.6% and 10.5% respectively. We further propose a new application area of transfer learning to analyze similarities and dissimilarities in geo-spatiotemporal characteristics of watersheds while reducing the extensive training time required in training local DL models. Recent research in hydrology shows transfer learning has been applied in predicting water temperature in unmonitored lakes  and water quality prediction system . In  transfer learning was applied for flood prediction, similar to us. However, they converted their time-series data to images to use transfer learning in a purely CNN architecture for time-series prediction with a reduction in computational cost by . In this paper, we explore how transferrable geo-spatiotemporal features are when kept in their original spatiotemporal form.
is the spatial grid vector where L is the total number of grids covering a region having spatial coordinates, the distance to these grids from the nearest river or water source is . On a certain day , the grids have precipitation measurements where which are mapped to their corresponding grid-based PB simulated runoff, where . The extent to which each grid’s precipitation contributes to the river discharge depends extensively on the grid’s distance to the nearest water source. The distance vector is thus transformed to a distance weight vector such that higher distance weights are applied to if the said grid is closer to a local river thus contributing more to the regional river discharge (Appendix Section 1). We denote weighted precipitation vector as . On day t, the input vectors and are mapped with respective daily river discharge observations where is the total number of daily river discharge observations. Input vector can be shown as where and . From a multivariate time-series point of view, we denote our inputs as where and denotes the total number of input variables per day which in our case is
. We want to predict the corresponding target outputs. The aim is to obtain a non-linear mapping between and . The motive behind the integration of CNN and LSTM in HydroDeep lies in capturing both the spatial and temporal dependencies of a watershed. The CNN layers help extract local geospatial features between the input variables and pass them to LSTMs to support temporal sequence prediction. HydroDeep has an initial input layer customizable to different input shapes (number of grids) to make our model easily transferable to other watersheds. If denotes the prediction day, HydroDeep trains on inputs from to , a weekly time window validated by empirical studies, to predict the output of day . A continuous drought for days can be followed by a hurricane, leading to a flood overnight, making it crucial to include the target day
’s inputs as the second input to our network, which is concatenated with the last LSTM layer’s output, and are processed collectively by a fully connected layer. We use data from Jan 1, 2000, to Dec 31, 2011, for training and Jan 1, 2012, to Dec 31, 2016, for evaluation. We optimized the hyperparameters of HydroDeep by random search. (Appendix Section 2)
2.2 Transfer Learning
An event of flood or drought is heavily dependent on environmental drivers. Likewise, each of these drivers’ inherent local spatiotemporal patterns is bound to be unique based on their geographical location. Consequently, one model that has learned the local spatiotemporal patterns of a region will fail to perform accurately for a geographically distant region with different characteristics. As a result, the model should be retrained perpetually for every new unique region. Alternatively, a global model can be trained on a larger area covering many watersheds to address this problem, but it will fail to capture the local patterns. Besides, both these methods are expensive as they require more training time and computational power to train such a model of global extent. Therefore, we use transfer learning to reuse HydroDeep’s knowledge from one region to another. More formally, transfer learning consists of a domain and a task where the domain
is the marginal probability distributionover an input feature space where is the total number of input features. Given a domain , a task
consists of a conditional probability distributionover a label space
. The conditional probability distribution is usually learned from the pairsin the training samples where and . Suppose there is a source domain with a source task and a target domain with a target task , through transfer learning we try to learn the target conditional probability distribution in , from the knowledge learned from and .
3 Experimental Design
3.1 Dataset and the Process-based Model - DLEM
For experiments, we use a PB hydro-ecological model named the Dynamic Land Ecosystem Model (DLEM) that mimics the plant physiological, biogeochemical, and hydrological processes in the plant-soil-water-river continuum [17, 19]. In DLEM, the design of grid-to-grid connection tracks significant features of a region, including within-grid heterogeneity, grid-to-grid flow, and land-aquatic linkage. DLEM models the water movement from land to aquatic systems at a daily time step with each grid cell comprising multiple land cover types, rivers, and lakes with their area percentage prescribed by land-use history data . Research shows that DLEM has been extensively validated against measurements from LTER, NEON, Ameri Flux, USDA crop yield survey, and USGS gauge monitoring and are widely used to quantify the spatiotemporal variations in the pool and fluxes of water, carbon, and nitrogen coupling (water-C-N) at the site, and regional scales [17, 19, 33, 35, 20]. The preliminary results from DLEM at the outlet of the Mississippi and Atchafalaya river basin (MARB) show that the variations of daily river discharge are very close to the USGS observed river discharge over the years .
We pulled dataset from the U.S. Geological Survey’s (USGS) daily discharge measurements for Iowa Streams having daily discharge measurements of 23 watersheds covering the state of Iowa . Alongside, we use daily precipitation at a 5-arc-min resolution generated from high-resolution gridded meteorological data products from station observations by the Climatic Research Unit (CRU) of the University of East Anglia , and North America Regional Reanalysis (NARR) dataset from a combination of modeled and observed data . We also use DLEM-simulated surface and subsurface runoff to guide our DL network 
. Climate, land management, and environmental drivers steer DLEM simulation and are used to represent our “best estimate” of land-to-aquatic surface and subsurface runoff across the watershed. For our experiment, we chose the Thompson Fork Grand River basin at David City (watershed 13) as our source domain due to its smaller size and transferred its knowledge to 5 target domains - West Nodaway River near Shambaugh (watershed 14), East Nishnabotna River near Shenandoah (watershed 15), Turkey River near Garber (watershed 4), South Skunk River near Oskaloosa (watershed 10), and Rock River near Hawarden (watershed 23). Watershed 13 (w13) has 29 grids with each grid having two input time-series signals corresponding to precipitation and DLEM runoff. Daily river discharge observations are given as labels to all 29 grids.
3.2 Regional Knowledge Transfer for Spatiotemporal Analysis
We have experimented with four transfer learning approaches to reuse HydroDeep’s knowledge in predicting river discharge in distant watersheds (Figure 1). The goal was to find the best approach to transfer HydroDeep’s grid-based spatiotemporal knowledge from one watershed to another to reduce the required training iterations on a new region and interpret the geo-spatiotemporal similarities and dissimilarities between the source and the target. In the first approach, we preserve the original HydroDeep’s spatiotemporal knowledge from the source w13 and use it solely in T-HydroDeep-1 (T-HD-1) to test the new target watersheds. Secondly, we transfer the original HydroDeep’s spatiotemporal knowledge in T-HydroDeep-2 (T-HD-2) and allow it to finetune on the target. In the third and fourth approaches, we take turns in finetuning just the temporal features in T-HydroDeep-3 (T-HD-3) and the spatial features in T-HydroDeep-4 (T-HD-4). The CNN layers and the LSTM layers are responsible for learning spatial and temporal features, respectively. When we finetune one kind of layer, we freeze the other to keep the originally learned features intact. Table 1 shows the observations from our ablation study.
The watersheds w14 and w15, although being adjacent to the source w13 , are observed to have distinct spatial features as T-HD-4 proved to be the best approach for both the watersheds. We also observed that the second-best approach for w14 is T-HD-1, which means w13 and w14 have similar temporal features, but since they vary in spatial features, only the spatial layers needed to be finetuned (T-HD-4) to improve prediction accuracy. Similarly, the result shows that w15 has more distinct spatial features than temporal features from w13, as T-HD-2 shows the second-best performance, just behind T-HD-4. However, w4 and w10, both being far from w13, show distinct spatiotemporal features as T-HD-2 shows the best performance. Note that w13 has only 29 spatial grids, whereas both w4 and w10 have 61 and 65 grids, respectively. HydroDeep’s knowledge is transferred to targets almost double the area of the source and still achieved the best performances among all the watersheds included in our experiment. This supports our arguments that more data availability will increase HydroDeep’s performance; transfer learning approaches work in targets larger in area than the source and can be used to analyze watersheds’ spatiotemporal characteristics. The original HydroDeep was pretrained for 300 iterations on w13 in seconds to achieve 0.63 NSE outperforming the baseline neural network architectures - CNN by , LSTM by , GRU by , and Bi-directional LSTM by that are commonly used for hydrological modeling (Appendix Section 3). In contrast, the transferred models on the new targets (w14, w15, w4, w10, and w23) achieved significant performance in just 20 training iterations in seconds, a reduction in time. In Table 1, training HydroDeep (HD) from scratch in each target watershed remains low when run for 20 iterations. Thereby, using transfer learning, the prediction performance on the individual watersheds increased by in w14, in w15, in w4, in w10, and in w23 in terms of NSE.
|Target Watersheds||No. of grids||HD||T-HD-1||T-HD-2||T-HD-3||T-HD-4||Time|
|Watershed 14 (w14)||34||0.27||0.33||0.30||0.32||0.39||154.35 20.81|
|Watershed 15 (w15)||39||0.24||0.46||0.47||0.42||0.50||154.87 20.38|
|Watershed 4 (w4)||61||0.71||0.76||0.82||0.81||0.82||156.52 20.44|
|Watershed 10 (w10)||65||0.80||0.82||0.87||0.86||0.86||157.15 21.40|
|Watershed 23 (w23)||32||0.36||0.45||0.38||0.50||0.46||154.21 20.84|
5 Conclusion and Future Work
This paper illustrates a new application of transfer learning techniques in interpreting geo-spatiotemporal characteristics of watersheds with limited computational resources and a reduction in time. We believe that a smaller grid-scale resolution will help HydroDeep to capture local features on a finer scale. In the future, we will run our experiments on more watersheds to better quantify the performance of HydroDeep and its variants. Further, we plan to perform extensive research on how to select the source watershed(s) more effectively.
-  (2018) Augmenting physical simulators with stochastic neural networks: case study of planar pushing and bouncing. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3066–3073. Cited by: §1.
-  (2020) Automated invasive ductal carcinoma detection based using deep transfer learning with whole-slide images. Pattern Recognition Letters 133, pp. 232–239. Cited by: §1.
-  (2021) Transfer learning based multi-fidelity physics informed deep neural network. Journal of Computational Physics 426, pp. 109942. Cited by: §1.
A transfer learning-based lstm strategy for imputing large-scale consecutive missing data and its application in a water quality prediction system. Journal of Hydrology 602, pp. 126573. Cited by: §1.
-  (2015) Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2625–2634. Cited by: §1.
-  (2016) An overview of current applications, challenges, and future trends in distributed process-based models in hydrology. Journal of Hydrology 537, pp. 45–60. Cited by: §1.
-  (2020) Transfer learning enhanced physics informed neural network for phase-field modeling of fracture. Theoretical and Applied Fracture Mechanics 106, pp. 102447. Cited by: §1.
-  (2020) Transfer learning for clinical time series analysis using deep neural networks. Journal of Healthcare Informatics Research 4 (2), pp. 112–137. Cited by: §1.
-  (2018-07) A deep cnn-lstm model for particulate matter (pm2.5) forecasting in smart cities. Sensors 18 (7), pp. 2220. External Links: Cited by: §1.
Physics-guided machine learning for scientific discovery: an application in simulating lake temperature profiles.
ACM/IMS Transactions on Data Science2 (3), pp. 1–26. Cited by: §1.
-  (2018) Iowa stream nitrate and the gulf of mexico. PloS one 13 (4), pp. e0195930. Cited by: §3.1, §4.
-  (2017) Physics-guided neural networks (pgnn): an application in lake temperature modeling. arXiv preprint arXiv:1710.11431. Cited by: §1.
-  (2020) Convolutional neural network coupled with a transfer-learning approach for time-series flood predictions. Water 12 (1), pp. 96. Cited by: §1.
-  (2009) Thermodynamics, irreversibility, and optimality in land surface hydrology. In Bioclimatology and natural hazards, pp. 107–118. Cited by: §1.
-  (2021) Tracking covid-19 using online search. NPJ digital medicine 4 (1), pp. 1–11. Cited by: §1.
-  (2018) Applied timeseries transfer learning. Cited by: §2.2.
-  (2013) Long-term trends in evapotranspiration and runoff over the drainage basins of the gulf of mexico during 1901–2008. Water Resources Research 49 (4), pp. 1988–2012. Cited by: §3.1.
-  (2020) Increased extreme precipitation challenges nitrogen load reduction to the Gulf of Mexico. Communications Earth & Environment, pp. Accepted. External Links: Cited by: §3.1, §3.1.
-  (2013) Net greenhouse gas balance in response to nitrogen enrichment: perspectives from a coupled biogeochemical model. Global change biology 19 (2), pp. 571–588. Cited by: §3.1.
-  (2019) Are we getting better in using nitrogen?: variations in nitrogen use efficiency of two cereal crops across the united states. Earth’s Future 7 (8), pp. 939–952. Cited by: §3.1.
-  (2019) Improving air quality prediction accuracy at larger temporal resolutions using deep learning and transfer learning techniques. Atmospheric Environment 214, pp. 116885. Cited by: §1.
-  (2021) Thermodynamics-based artificial neural networks for constitutive modeling. Journal of the Mechanics and Physics of Solids 147, pp. 104277. Cited by: §1.
-  (2006-03) North American Regional Reanalysis. Bulletin of the American Meteorological Society 87 (3), pp. 343–360. External Links: Cited by: §3.1.
-  (2005) An improved method of constructing a database of monthly climate observations and associated high-resolution grids. International Journal of Climatology 25 (6), pp. 693–712. External Links: Cited by: §3.1.
-  (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the ASABE 50 (3), pp. 885–900. Cited by: §1.
Convolutional neural network for remote-sensing scene classification: transfer learning analysis. Remote Sensing 12 (1), pp. 86. Cited by: §1.
-  (2020) A comprehensive review of deep learning applications in hydrology and water resources. Water Science and Technology. Cited by: §1.
-  (2018) A survey on deep transfer learning. In International conference on artificial neural networks, pp. 270–279. Cited by: §1.
-  (2015) Show and tell: a neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164. Cited by: §1.
-  (2009) A model of surface heat fluxes based on the theory of maximum entropy production. Water resources research 45 (11). Cited by: §1.
-  (2020) Deep learning of subsurface flow via theory-guided neural network. Journal of Hydrology 584, pp. 124700. External Links: Cited by: §1.
-  (2020) Predicting water temperature dynamics of unmonitored lake systems with meta transfer learning. In AGU Fall Meeting Abstracts, Vol. 2020, pp. H166–0030. Cited by: §1.
-  (2015) Increased nitrogen export from eastern north america to the atlantic ocean due to climatic and anthropogenic changes during 1901–2008. Journal of Geophysical Research: Biogeosciences 120 (6), pp. 1046–1068. Cited by: §3.1.
Real-time reservoir operation using recurrent neural networks and inflow forecast from a distributed hydrological model. Journal of Hydrology 579, pp. 124229. Cited by: §1.
-  (2019) Largely underestimated carbon emission from land use and land cover change in the conterminous united states. Global change biology 25 (11), pp. 3741–3752. Cited by: §3.1.