2017 was the costliest year on record for natural disasters in the United States. Hurricanes Harvey , Irma  and Maria, and other natural disasters caused more than $306 billion. Flood forecasting or stage height prediction is a problem that is being explored by hydrological models that requires years of experience and knowledge. Even though, years of work is invested in those advanced models they still lack in forecasting floods accurately at arbitrary locations and durations.
In this study, we present a benchmark dataset for flood forecast to generate and test deep neural network performance for accurate prediction of flood prediction. Since the flood forecasting is a task involves time-series data, both dataset and networks are created taking this into account.
This paper first discusses related works in the next section. Section II provides preliminary information regarding the task and raw data. The dataset creation process is scrutinized in detail in the same section, after that the proposed neural network structures are introduced. In the results and discussion section, outcomes of the performance metrics are shared for proposed networks and the results are provided before the conclusion section.
I-a Related Work
Most of the studies extend the feed-forward networks and propose a neural network model that does not involve a high number of dataset entries because of the technical limitations. Also, these studies involve mostly fuzzy logic instead of artificial intelligence focused aspects of deep neural networks. There are studies that utilizes artificial neural networks for flood forecasting purposes as well[8, 9, 10, 11]
. These studies present backpropagation networks centric approaches in flood forecasting. Also, in[12, 13, 14]15, 16]
explore Support Vector Machines in flood forecasting. Besides limited studies in flood forecasting and gage height prediction, deep neural networks are utilized with latest algorithmic advances for similar tasks in hydrology such as reservoir inflow forecasting
, precipitation estimation, runoff analysis . Even though forecasting is mostly time depended, forecasting tasks mentioned here vastly comprises network architectures that do not take advantage of sequential nature of time series data. Flood forecasting also involves time-series data and overcoming this task should be done by creating the dataset centered upon its time-depended features.
Studies that use deep neural networks for time-series data show great results and provides an extensive vision in increasing the usage of neural networks architectures in time-series data. Researchers show that neural networks can be used for traffic speed prediction [20, 21, 22], taxi demand prediction , financial time-series prediction [24, 25]. Traffic speed prediction is a very similar task with flood prediction since both of them relies on changes in connected points in the network.
In this paper, we propose a flood prediction benchmark dataset for future applications in machine learning as well as a scalable approach to forecasting river stage for individual survey points on rivers. The approach takes into account the historical stage data of survey points on selected upstream locations, as well as precipitation data. This approach is both decentralized and doesn’t need any historical data from unrelated survey points and can be used in real-time. Recurrent Neural Networks (RNNs), in particular, Gated recurrent unit (GRU) Networks are utilized throughout this study. We also show that this approach presents satisfiable results when it’s applied for the state of Iowa as a proof of concept. The data are gathered from the Iowa Flood Center (IFC) and United States Geological Survey (USGS) sensors on the rivers within the state of Iowa. Findings of this project and the deep neural network model will benefit operational cyber platforms, intelligent knowledge systems , and watershed information and visualization tools [28, 29, 30] with enhanced river stage forecasting.
In this section methods that this study relies on are explained. Preliminary information regarding the problem definition and details about the case study dataset is detailed below. After the dataset information, deep neural network models is presented.
Ii-a Preliminary Work
Flood forecasting or prediction of river stage for a particular point on the river network depends on the neighboring streams. The problem of flood forecasting should be solved by anticipating the height of the water on streams. This problem and potential physical approaches expect a deep understanding of the hydrology and domain.
Since the aim of this study is to predict the water level for selected points on rivers, river network structure and connectivity should be understood by the framework. If the intersection points on river network are considered as nodes and rivers are considered as edges, river networks make directed acyclic graphs (DAGs). Water only flows from upstream to downstream and it never makes circular moves. In this study, instead of using intersections, gage height sensors are used as nodes. United States Geological Survey (USGS), has gage height sensors all around the US. Sensors provide gage height measurements with a temporal resolution of 15 minutes. By using these sensors, a sensor network can be formed. The river network is a DAG, the existence of more than one sensor on a river is possible, and some nodes can have only one child node. Each sensor has upstream sensors and downstream sensors. Among two connected sensors, or nodes, A and B, if the water in their watershed reaches the sensor A before it reaches to the sensor B, then it’s clear that A is on B’s upstream and B is on A’s downstream. The water level of a sensor vastly relies on the water level of its upstream sensors. Therefore, it’s important to incorporate upstream water level information into the forecasting effort.
As a data-driven approach to testing the capabilities of deep learning for mapping rainfall and runoff, this study doesn’t involve other mechanisms and processes that affect gage heights and floods.
Data used in this study comprises 3 different sets. The first dataset consists of gage height from USGS and UFC sensors. The second dataset is the NOAA’s Stage IV radar rainfall product and the last dataset is the metadata and sensor information from IFC. USGS has 201 gage height sensors within the state of Iowa, each providing gage height measurements with 15 minutes temporal resolution. Data obtained from USGS data sources for these sensors contain the sensor id, date and time of the measurement with timezone, and the measurement value. We needed to preprocess these data in order to form a data structure for our models. The preprocessed dataset was created as a hash table that has sensor ids as keys and has the value of another hashtable of datetime, measurement dates. The internal hashtables that have datetime as their keys were converted to datetimes in UTC which was previously CST or CDT depending on measurement’s the time of the year to avoid any timezone related problems.
Another preprocessing step was done to extract usable sensors. The IFC data regarding sensors contains network information for the sensors. We extracted the upstream sensor list for each USGS sensor. The list for each sensor was sorted by their proximity to the intended sensor. The number of upstream sensors that will take place on the input data was selected to be 4 and when the sensors that have a number of upstream sensors lower than that value was discarded. The final usable USGS sensor list contained 45 sensors. Values other than 4 was tested but no significant improvement was observed.
For a dataset entry with a sensor S and time t, the input part of the dataset vastly comprises gage height data from upstream sensors of the sensor S for measurements before time t, as well as the previous measurements from the sensor S itself. Also, rainfall that falls within the watershed of sensor S takes place in the input part of the data. Output part of the data is future measurements of sensor S after the time t. In this study, the approach is designed to predict 24 hours of future measurements thus the output consists of 24 values, in which, one value represents each forecasted hour.
We formed two similar datasets using the preprocessed data. They differ in size for the historical information they have regarding sensors and precipitation. The smaller dataset only has 4 hours of stage measurements for each upstream sensor and 3 hours of information for the intended sensor as well as precipitation data array with the length of 20. The larger dataset has 24 hours of data for each upstream sensor, and 24 hours of data for the intended sensor and precipitation data array with the length of 40. The output parts for both datasets were same.
The datasets were formed by creating a dataset entry vector for each 15 minutes datetime instance between January 2009 and June 2018. While each vector for smaller dataset comprised total of 39 input values and 24 output values, vectors of the larger dataset had a total of 160 input values and 24 output values. Since each input has two parts (height and precipitation data), we formed them separately and then concatenated them together. For clarification purposes, we will explain the dataset using the same approach.
Height vector, H (Equation 1, for smaller dataset in which represents height data for second sensor in upstream of Sensor S at time t-td-2.) part of any dataset entry was created using USGS data. When gathering data for an output of 24 hours starting at time t, for each upstream sensor, data were taken for in which td refers the travel time distance of water between the upstream sensor and the sensor that is intended to anticipate stage height change. Time distance information was extracted from the sensor data by IFC. Each sensor point has time distance to the outlet point in the corresponding watershed. By taking the time distance difference between an upstream sensor and the intended sensor, the time distance between these two sensors can be calculated. Using this information, incorporated upstream data included the data for the measure water level that most likely will affect the intended sensor’s water level.
The second vector that comprises a dataset entry is precipitation vector. Rainfall data from Stage IV product (Fig. 2.) provided in rasters. The data in raster files contains precipitation values for the entire US divided into parcels. After converting raster files into easily accessible arrays, the next step is to determine the approach that will be used when acquiring the data from the rasters for each sensor and datetime pairs. The first approach was to use precipitation data only for parcels that include exact locations of upstream sensors and the intended sensors without considering their watersheds but this approach doesn’t take the rainfall that falls to the area between sensors into account. Another approach was to use data for the entire watershed of the intended sensor. Even though this approach has the potential to represent the domain better, since all of the precipitation in upstream sensor watersheds drains and are represented in the gage measurements, this approach brings unnecessary data amplitude.
In order to both keep important information in the dataset and skip the already obtained information, we used an approach in which the sections of the watershed area is eliminated based on upstream sensors. Watershed area that will be used to gather precipitation data representing rainfall domain that will eventually reach intended sensor is calculated, and the watershed area of upstream sensors was excluded from the watershed of the intended sensor. Remaining parcels were divided into sections depending on the parcels’ water time distance to the actual sensor (Fig. 3.).
After gathering the watershed information that will be used to acquire rainfall data, the rasters were read and depending on the time distance, precipitation measurements were obtained. For instance if the output vector contains measurements for the sensor S at time t, the precipitation data would be obtained for
for each parcel. Then, the average of the measurements that parcels have was taken for each time distance value. Eventually, all of the precipitation vectors were formed but they almost always had a different length. To provide the same length vectors, the precipitation vectors were padded into vectors with a length of 20 or 40 depending on the aforementioned dataset sizes. If the vector is larger than the size predetermined vector size, the remaining part was cropped out and if the vector was smaller than the size, empty places were filled with zeros. EquationLABEL:precvector demonstrates final precipitation vector P for smaller dataset in which represents mean of precipitation data for parcels that are 8 hours away from the sensor S.
The last step was to form output vector. Output vector for a dataset entry of a sensor S that will contain data for time t will have height measurements from t to t+23. Equation LABEL:outputvector shows the form of output vector O while represents height measured on Sensor S at time t+17 in feet.
After both precipitation and height vectors were formed, the input vector was created by concatenating them. Both smaller and larger datasets were formed by this pipeline only differing in the sizes when applicable. After running through this process for each sensor and datetime instance within the mentioned date range, 298,496 dataset entries were formed for the larger dataset and 354,816 dataset entries were formed for the smaller dataset. Recall that smaller and larger names are used depending on the size of the individual input vectors, not the actual dataset size.
Ii-C Utilizing Neural Networks
Predicting stream heights vastly depends on previous states of streams and this makes the flood forecasting problem more of a time-series forecasting task. Considering this nature of the problem, using more sequentially capable network architectures like RNNs make good implementation choices. Due to vanilla RNN networks’ vanishing gradient problem, Long short-term memory (LSTM) networks and GRU networks are major two network architecture options in the literature for such cases. Since GRU networks have less computationally costly operations than LSTM networks, in other words, because of GRU has fewer gate computations but still matches the LSTM’s performance, we chose to implement a GRU based network.
This study proposes two networks for comparison purposes, the first one is fully-connected network and the second one is the GRU based neural network. The fully-connected network structure can be found in (Table I4) function.
GRU based network consists of five GRU subnetworks (II), one for each upstream and one for previous measurements of the intended sensor, and a fully-connected subnetwork (Table III) for the precipitation data. Outputs of all these subnetworks are then fed into a fully-connected output network with ReLU activations until the output (Figure 4) is computed.
Cho et al. in 2014 proposed GRU networks. GRU cells are similar to LSTM cells in terms of utilization and their capabilities in handling vanishing gradient problem. GRU comprises concepts that are easier to implement while providing very similar performance with the LSTM. A GRU cell has two gates, a reset gate in which previously learned features and new inputs are combined, and the update gate which determines how much of the memory will be remembered. A GRU cell’s formulation can be expressed as,
where , , and represent reset gate, update gate, hidden state candidate and hidden state respectively. Also
is sigmoid function andis tanh function. In GRU cells, while reset gate affects the hidden state by taking place in hidden state candidate’s formula, update gate significantly changes hidden state.
Iii Results & Discussion
Proposed neural network architectures are implemented using PyTorch numeric computing library (v0.40) and master version of experiment time (0.5.0a0+290d20b) on Python programming language (v3.6). The source code was written to train networks using the Adam Optimizer 
as the optimization method and mean squared error (MSE) as the loss function. Proposed datasets were split into training and testing sets with an approximate rate of 80%. Implemented networks were trained on training sets using NVIDIA Tesla K80 GPUs.
It should be noted that while the larger dataset was run with both GRU-based architecture and fully-connected architecture, the smaller dataset was only run with fully-connected architecture to understand the effect of using less time-dependent data in such learning task.
|Larger Dataset||Smaller Dataset|
Score table that shows MSE on testing datasets for proposed models which are trained on training datasets can be found in Table IV. Scores clearly show that providing more data did not significantly improve the accuracy of the fully-connected model. However, it can be seen that model choice has important effects on the model’s overall testing performance. We can easily say that RNN based models make better architecture choices for prediction tasks such as flood forecasting.
Actual measurements and values that GRU based network predicted for 4 USGS sensors are given in Figure 5. Shared results suggest that when stage height does not show dramatic changes, the model is successful to anticipate next measurements but when there are apparent fluctuations, the model is not able to perform as successful, but still it reports somewhat similar predictions. It should be noted that even though the forecasts seem to not frequently show an exact match with actual measurements, they do not possess huge numeric differences.
Considering the reported MSE and the similarity between measurements and forecasts that GRU based model made, it can be said that the overall performance of the neural networks with the proposed decentralized approach is acceptable.
While this paper demonstrates a benchmark dataset and methodology for flood forecasting that employs deep neural networks, it also presents promising results using a data-driven approach. The approach in this study could be improved in the future by incorporating other datasets such as soil moisture data from point source measurements and satellite data such as Soil Moisture Active Passive (SMAP) as well as evaporation measurements which can help in demonstrating the water budged better.
Models proposed in this study can be used to present more enhanced forecasting results on operational information systems along with forecasts of advanced hydrological models. Presented results show that artificial neural networks based decentralized flood forecasting approach for the state of Iowa anticipates the stage height very close to the actual height measurements.
The work reported here has been possible with the support and work of many members of the Iowa Flood Center at the IIHR Hydroscience and Engineering, University of Iowa.
-  G. J. van Oldenborgh, K. van der Wiel, A. Sebastian, R. Singh, J. Arrighi, F. Otto, K. Haustein, S. Li, G. Vecchi, and H. Cullen, “Attribution of extreme rainfall from hurricane harvey, august 2017,” Environmental Research Letters, vol. 12, no. 12, p. 124009, 2017.
M. A. Sit, C. Koylu, and I. Demir, “Identifying disaster-related tweets and their semantic, spatial and temporal context using deep learning, natural language processing and spatial analysis: a case study of hurricane irma,”International Journal of Digital Earth, pp. 1–25, 2019.
-  C. W. Dawson and R. Wilby, “An artificial neural network approach to rainfall-runoff modelling,” Hydrological Sciences Journal, vol. 43, no. 1, pp. 47–66, 1998.
-  K. Thirumalaiah and M. C. Deo, “Hydrological forecasting using neural networks,” Journal of Hydrologic Engineering, vol. 5, no. 2, pp. 180–189, 2000.
-  C. Dawson and R. Wilby, “Hydrological modelling using artificial neural networks,” Progress in physical Geography, vol. 25, no. 1, pp. 80–108, 2001.
-  P. Nayak, K. Sudheer, D. Rangan, and K. Ramasastri, “Short-term flood forecasting with a neurofuzzy model,” Water Resources Research, vol. 41, no. 4, 2005.
-  F.-J. Chang, Y.-M. Chiang, and L.-C. Chang, “Multi-step-ahead neural networks for flood forecasting,” Hydrological sciences journal, vol. 52, no. 1, pp. 114–130, 2007.
-  M. Campolo, P. Andreussi, and A. Soldati, “River flood forecasting with a neural network model,” Water resources research, vol. 35, no. 4, pp. 1191–1197, 1999.
-  W. Huang, B. Xu, and A. Chan-Hilton, “Forecasting flows in apalachicola river using neural networks,” Hydrological processes, vol. 18, no. 13, pp. 2545–2564, 2004.
-  G.-F. Lin and G.-R. Chen, “A systematic approach to the input determination for neural network rainfall–runoff models,” Hydrological Processes: An International Journal, vol. 22, no. 14, pp. 2524–2530, 2008.
F. Liu, F. Xu, and S. Yang, “A flood forecasting model based on deep learning algorithm via integrating stacked autoencoders with bp neural network,” inMultimedia Big Data (BigMM), 2017 IEEE Third International Conference on, pp. 58–61, IEEE, 2017.
G.-F. Lin and L.-H. Chen, “A non-linear rainfall-runoff model using radial basis function network,”Journal of Hydrology, vol. 289, no. 1-4, pp. 1–8, 2004.
-  G.-F. Lin and M.-C. Wu, “An rbf network with a two-step learning algorithm for developing a reservoir inflow forecasting model,” Journal of hydrology, vol. 405, no. 3-4, pp. 439–450, 2011.
-  K. Chaowanawatee and A. Heednacram, “Implementation of cuckoo search in rbf neural network for flood forecasting,” in Computational Intelligence, Communication Systems and Networks (CICSyN), 2012 Fourth International Conference on, pp. 22–26, IEEE, 2012.
-  D. Han, L. Chan, and N. Zhu, “Flood forecasting using support vector machines,” Journal of hydroinformatics, vol. 9, no. 4, pp. 267–276, 2007.
-  P.-S. Yu, S.-T. Chen, and I.-F. Chang, “Support vector regression for real-time flood stage forecasting,” Journal of Hydrology, vol. 328, no. 3-4, pp. 704–716, 2006.
Y. Bai, Z. Chen, J. Xie, and C. Li, “Daily reservoir inflow forecasting using multiscale deep feature learning with hybrid models,”Journal of hydrology, vol. 532, pp. 193–206, 2016.
-  Y. Tao, X. Gao, A. Ihler, K. Hsu, and S. Sorooshian, “Deep neural networks for precipitation estimation from remotely sensed information,” in Evolutionary Computation (CEC), 2016 IEEE Congress on, pp. 1349–1355, IEEE, 2016.
-  T. Izumi, M. Miyoshi, and N. Kobayashi, “Runoff analysis using a deep neural network,”
-  J. Wang, Q. Gu, J. Wu, G. Liu, and Z. Xiong, “Traffic speed prediction and congestion source exploration: A deep learning method,” in Data Mining (ICDM), 2016 IEEE 16th International Conference on, pp. 499–508, IEEE, 2016.
-  J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual networks for citywide crowd flows prediction.,” in AAAI, pp. 1655–1661, 2017.
-  Y. Jia, J. Wu, and Y. Du, “Traffic speed prediction using deep learning method,” in Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on, pp. 1217–1222, IEEE, 2016.
-  H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, and J. Ye, “Deep multi-view spatial-temporal network for taxi demand prediction,” arXiv preprint arXiv:1802.08714, 2018.
-  X. Ding, Y. Zhang, T. Liu, and J. Duan, “Deep learning for event-driven stock prediction.,” in Ijcai, pp. 2327–2333, 2015.
-  W. Bao, J. Yue, and Y. Rao, “A deep learning framework for financial time series using stacked autoencoders and long-short term memory,” PloS one, vol. 12, no. 7, p. e0180944, 2017.
-  I. Demir, E. Yildirim, Y. Sermet, and M. A. Sit, “Floodss: Iowa flood information system as a generalized flood cyberinfrastructure,” International Journal of River Basin Management, pp. 1–8, 2017.
-  Y. Sermet and I. Demir, “An intelligent system on knowledge generation and communication about flooding,” Environmental modelling & software, vol. 108, pp. 51–60, 2018.
-  I. Demir, H. Conover, W. F. Krajewski, B.-C. Seo, R. Goska, Y. He, M. F. McEniry, S. J. Graves, and W. Petersen, “Data-enabled field experiment planning, management, and research using cyberinfrastructure,” Journal of Hydrometeorology, vol. 16, no. 3, pp. 1155–1170, 2015.
-  I. Demir, F. Jiang, R. V. Walker, A. K. Parker, and M. B. Beck, “Information systems and social legitimacy scientific visualization of water quality,” in 2009 IEEE International Conference on Systems, Man and Cybernetics, pp. 1067–1072, Oct 2009.
-  I. Demir and M. B. Beck, “Gwis: A prototype information system for georgia watersheds,” in Georgia Water Resources Conference: Regional Water Management Opportunities, p. 6.6.4, April 2009.
-  K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
-  A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-W, 2017.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.