ML-based Flood Forecasting: Advances in Scale, Accuracy and Reach

by   Sella Nevo, et al.

Floods are among the most common and deadly natural disasters in the world, and flood warning systems have been shown to be effective in reducing harm. Yet the majority of the world's vulnerable population does not have access to reliable and actionable warning systems, due to core challenges in scalability, computational costs, and data availability. In this paper we present two components of flood forecasting systems which were developed over the past year, providing access to these critical systems to 75 million people who didn't have this access before.



There are no comments yet.


page 9

page 10


ML for Flood Forecasting at Scale

Effective riverine flood forecasting at scale is hindered by a multitude...

Forecasting Framework for Open Access Time Series in Energy

In this paper we propose a framework for automated forecasting of energy...

Hourly-Similarity Based Solar Forecasting Using Multi-Model Machine Learning Blending

With the increasing penetration of solar power into power systems, forec...

Recurrent Flow-Guided Semantic Forecasting

Understanding the world around us and making decisions about the future ...

Brazilian Favela Women: How Your Standard Solutions for Technology Abuse Might Actually Harm Them

Brazil is home to over 200M people, the majority of which have access to...

Forecasting vegetation condition for drought early warning systems in pastoral communities in Kenya

Droughts are a recurring hazard in sub-Saharan Africa, that can wreak hu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Flooding harms and early warning impact

Floods are the most common, and among the most deadly, natural disaster on the planet, affecting hundreds of millions of people every year, and causing between thousands and tens of thousands of fatalities annually CRED (2015). Recent research shows that those who receive flood warnings prior to the arrival of the flood are twice as likely to evacuate or take other protective measures than those who don’t receive such warning, with 65% of those warned taking action 15.

As a result, warning systems have the potential to save lives and reduce harm to livestock and other assets. For this reason, the World Bank has identified early warning systems for floods as the most cost-effective tool for climate change adaptation amongst all options evaluated. It estimated that for each dollar spent on such systems, nine dollars of damages are prevented

5 (2011).

1.2 Challenges

The vast majority of populations vulnerable to flooding do not have access to reliable, actionable flood warnings Hirpa et al. (2016). There are several reasons why this is a difficult challenge, including the following:

  • Local calibration: Existing high-accuracy flood models often require large amounts of per-site data and effort to set up and calibrate "Beven ("2011"). This limits the scale at which such systems can be deployed, and therefore most high-resolution, high-accuracy systems are deployed at city-scale rather than country, continent, or global scales.

  • Computational Complexity: Another challenge for scaling actionable flood forecasting systems are the computational costs required to make alerts targeted and useful. Inundation/hydraulic/hydrodynamic modeling requires computational power that scales linearly with coverage area, but is also inversely proportional to the cube of the resolution. Since the number of rivers rises exponentially with the reduction of their width Downing et al. (2012), this effectively means that such systems require exponential growth in computational costs.

  • Data Scarcity

    : Flood forecasting systems rely on a wide range of inputs that can be critical for accuracy, such as discharge, riverbed bathymetry, elevation maps, etc. While this data is publicly available in a small number countries, many countries do not collect it at all, and many of those that do refuse to make it publicly available and may even define it as classified. This makes reliable forecasting a significant challenge in many regions.

1.3 Paper overview

This paper focuses on riverine flooding, which is responsible for a majority of flood fatalities globally Jonkman (2005). Our long term vision is for everyone in the world to have access to reliable riverine flooding alerts. We discuss two steps in that direction that specifically help address some of the issues outlined above:

  • Water level-based hydrologic models: Within the last year, ML has been shown to be able to help address the local calibration problem Kratzert et al. (2019b, a). Here we explore adjusting similar models to work with water level data, which is more common globally than discharge data. This expands the globally-available data set for streamflow modeling.

  • Morphological models: A new method for inundation modeling, which combines physics-based and ML modeling, requires less manual calibration, is more computationally efficient, and achieves better accuracy in real-world data-scarce conditions than classic hydrodynamic modeling using finite-element solutions.

These models were launched in India and Bangladesh covering an area of over 67,000 square kilometers protecting 75 million people (though some areas only have access to one of the models). In the past 5 months, we have sent over 39 million alerts to people affected by the floods and governmental relief agencies.

2 Water level-based hydrologic models

Hydrologic models aim to forecast the amount of water in a river by taking inputs such as precipitation, radiation, and other meteorological variables to determine discharge, which is a volumetric rate, at a timescale of hours, days, or months. Most literature about hydrologic modeling assumes discharge as the variable of interest, with a few exceptions Moshe et al. (2020). Discharge measurements are typically derived by measuring the height of water in a river (called stage) and then using a rating curve to translate this measurement into an estimate of discharge. Rating curves are derived from bathymetry and velocity measurements taken in a particular river, and must be routinely updated as river conditions change. Most rivers around the world do not have rating curves, and calibrating models to water level (stage) is fundamentally difficult because the relationship between precipitation and water level is strongly controlled by local channel geometry. This means that the majority of the world’s streamflow data is unavailable for calibrating hydrologic models, which presents a challenge for designing operational flood warning systems that scale globally.

2.1 Deployed model

Machine learning can help address this problem. Currently, we have deployed water level models in operational flood warning systems in India and Bangladesh covering 67,000 square kilometers and 75 million people, where reliable discharge measurements are not available. These models are based on regressions that use recent, near real-time water level measurements from upstream gauges as inputs, going back 72 hours at hourly resolution. Models were trained individually for each of 52 stream gauges using non-public streamflow data from the 5-month monsoon seasons (June through October) in 2014 through 2019, and deployed during the 5-month monsoon season in 2020. We monitor several water level-based performance metrics – the aggregate performance across the 52 watersheds are:

  • Average lead time: 20.7 hours

  • R2 over water level: 0.99

  • Mean absolute error: 0.067 meters

  • Mean squared error: 0.011 meters2

2.2 Research model

To improve on the deployed models, and to allow for launching at more locations during the upcoming 2021 monsoon season in India and Bangladesh, we tested the LSTM-based rainfall-runoff modeling strategy developed and benchmarked by Kratzert et al. (2019b). LSTMs generally require large training data sets spanning multiple catchments Gauch et al. (2020), which exacerbates the challenge of using water level data with its strong dependence on catchment-specific bathymetry. Using high-quality precipitation and streamflow data from the US, we adapted the LSTM benchmarking experiments by Kratzert et al. (2019b) to train with water level data (see Appendix A for full details on the experimental setup). We achieved a median accuracy over 499 test basins of compared to with discharge. This is a small loss in accuracy when using water level instead of discharge, especially when compared with similar losses that occur when calibrating traditional rainfall-runoff models using water level data (e.g., Jian et al. (2017)).

LSTMs trained on discharge data generalize relatively well to new catchments where streamflow data is not available for training Kratzert et al. (2019a) (most catchments in the world are ungauged). However, because the precipitation-stage relationship is heavily controlled by local bathymetry, models trained on stage data do not generalize in the same way as models trained on discharge. This is shown in Figure 1

, which plots cumulative density functions (CDFs) of results from applying LSTMs to simulate time series of daily streamflow (water level and discharge) from time series of daily precipitation at 499 US catchments. These CDFs compare coefficients of determination between 10 years of simulated vs. observed streamflow in each catchment when trained on water level data, discharge data, and both together using a multi-output head and a multi-target loss function. Models were tested in two types of prediction situations: (i) out-of-sample in time, and (ii) out-of-sample in space - the latter by using k-fold cross validation

across 499 US catchments. Notice that the models trained on water level in gauged catchments (i.e., out-of-sample-in-time) are generally better than the ungauged (out-of-sample-in-space) discharge models, but that the ungauged (out-of-sample-in-space) discharge models are substantially better than the ungauged water level models.

Figure 1: Cumulative density functions (CDFs) of model skill at predicting water level and discharge over 10-year test periods in 499 basins in the United States using models trained on water level and/or discharge and cross validated in time (i.e., at the same basins where they were trained) or space (i.e., at different basins than where they were trained).

These results show that to provide best-possible predictions everywhere, it is necessary to train models that predict both water level and discharge: discharge can be estimated with higher accuracy than river stage in ungauged basins, however the multi-target head and loss function allow to simultaneously train on water level data, and this provides benefit in catchments that have only water level data. Ideally, any inundation mapping strategy that relied on this type of rainfall-runoff model would be able to utilize both/either stage and discharge estimates.

3 Morphological models

Inundation models simulate the movement of water across the floodplain to produce spatially accurate forecasts of flood extent. They accept as an input hydrologic boundary conditions (expressed either as discharge or water level), and output a map of the floodplain inundation, including water depth at each pixel.

The classic approach to implementing such models is using finite-element solutions to the St. Venant (or Shallow Water) equations. Such models are used almost universally in both academia and operational settings. However, there are many challenges in increasing their scale and reach. For a deeper discussion on these challenges, and efforts to address them within the scope of finite-element solutions, see Ben-Haim et al. (2019).

In this section we present an approach to inundation modeling that improves on classic hydrodynamic modeling in many respects, including scalability, computational efficiency, and accuracy (at least in data-scarce settings). These models require an up-to-date, high resolution elevation map of the relevant floodplain (around one meter resolution, depending on the characteristics of the basin). They do not, however, require knowing riverbed bathymetry.

In this approach, which we call the morphological model, we break the inundation modeling task into two separate components:

  • Learn the river profile – i.e., the water level at every point along a one-dimensional line representing a river – as a function of water level (stage) measurement at some point along the river.

  • Given the river profile, estimate inundation depth across the full two-dimensional floodplain.

These steps are described below in reverse order.

3.1 Expanding the river profile to the floodplain

Assuming we know the water level at each point along a one-dimension river, we’d like to extend the water across the floodplain to identify what areas are inundated, and at what depth. We do so with no learned parameters – by simply applying a deterministic algorithm that follows from certain basic principles and simple heuristics. Classically, this is done using a finite-element solution to the two-dimensional St. Venant equations, but that type of solution is computationally expensive, and is sensitive to both numerical instabilities and minor errors in the elevation map, leading to significant manual labor in identifying and correcting such issues.

To avoid these hurdles, we employ a simple heuristic approach. To understand this approach, we begin with an inundation map that has no flow (i.e. the surface of the water is flat). In this degenerate case, given the height of the water at any point within the map, a simple 2D flood fill algorithm would yield the inundation map across the entire flood plain.

Next, in the simple case where the river is a straight line but the surface of the water is no longer flat, we assume that most of the dynamics of the downstream flow are already captured in the river profile. We assume that the water flow direction is approximately in the direction of the river, and so no (or little) flow occurs in directions that are perpendicular to the river. Thus, given the river profile, we can calculate the height of the water surface at any point in the map by simply expanding the river profile in the direction perpendicular to the river. Subtracting this height from every point in the elevation map results in a new elevation map, or a new floodplain morphology. In this new morphology, the water’s surface must be flat, and thus we are back in the degenerate case (see Figure 2 in Appendix B for a visualization).

In reality, rivers rarely progress in perfectly straight lines. Yet this algorithm can be generalized to any river course. We find constant elevation lines in an inundation map matching the given elevation map. This is equivalent to mapping each pixel in a 2D floodplain to a pixel in the 1D river profile that will have a similar water level in any inundation map. To do this, We use a simple heuristic which assigns each floodplain pixel with the river pixel closest to it, and then apply smoothing to make sure that pixels that are close to each other in the floodplain are not associated with pixels that are very distant from each other in the river (see Figure 3 in Appendix B). We believe this heuristic can be improved, though even it performs relatively well. Once this mapping is defined, we continue similarly to the original (straight line) case with the associated subsets of pixels replacing the perpendicular lines. We subtract from each subset of pixels associated with the same river profile point the water level at that river profile point. We get a new morphology for the whole floodplain, one in which applying a simple two-dimensional flood-fill algorithm with water level zero achieves a good approximation of the true inundation map.

3.2 Estimating the river profile

Now that we know how to expand the river profile into the full floodplain, we return to estimating the river profile itself from a single gauge. Our goal is to learn a function from a single real variable (the gauge measurement) to a 1-dimensional line (the river profile). Alternatively, we can see this as estimating the water level at a point as a function of both the (estimated or measured) streamflow water level and location along the river. We can make several assumptions which greatly reduce the space of possible river profiles - we know the function is continuous in both gauge measurement and location, we know it is monotonously decreasing in location (since rivers, generally speaking, flow downwards rather than upwards), and we know the function is monotonously increasing in water level (since one can generally assume that an increase in water in one point along the river will not indicate a decrease in the amount of water at another point upstream or downstream from it). These assumptions also allow us to transfer information about the river profile at one point to other points.

We learn the function between the gauge measurement and the river profile using historical inundation maps. We can score a function on a given historical flood event by expanding the river profile at the relevant gauge measurement to the floodplain as described in Section 3.1 and comparing the resulting flood extents. We search for the function that optimizes this score over a catalogue of historical inundation maps which were derived from SAR satellite imagery. We use a local search approach to find the optimal function, raising or lowering the river profile at a specific gauge measurement depending on whether it produces overflooding or underflooding, while maintaining the monotonicity and continuity constraints described above. See Figure 4 for a visualization example of this type of learned river profiles.

3.3 Overview and results

Combining these two steps we find that based only on an elevation map, past gauge measurements, and past inundation maps, we can (a) learn a function from the gauge measurement to the full river profile, (b) edit the morphology of the floodplain based on the river profile we have deduced, and (c) calculate the inundation map (including inundation depths) extremely efficiently using a simple flood-fill algorithm that is applied to the new synthetic morphology.

To evaluate the performance of this methodology relative to classic finite-element solution models, we trained and evaluated the two models across 11 different regions in the Ganges and Brahmaputra basins in India. We evaluated the models across the metrics in Table 1. As seen in Table 1, the morphological model outperforms the classic hydrodynamic model across all metrics with the exception of recall, in which the two are statistically indistinguishable.

Model type Avg. Recall Avg. Precision Avg. Manual Work Hours CPU Costs
Hydrodynamic model 71.4% 72.7% 30 4-131 CPU years
Morphological model 71.3% 75.8% 4 100-1200 CPU hours

This parameter was roughly estimated by the engineers working on both models, and was not accurately measured.

Table 1: Inundation benchmarking results.

We have launched this morphological model to cover 38,000 square kilometers and 44 million people. In our real-time operational systems, spanning 429 flood events in the 2020 Monsoon season we achieve the following metrics at a 64 meter resolution:

  • Precision: 76.2%

  • Recall: 77.6%

Note that the methodology described in this section includes an assumption that there are no confluences or bifurcations in the river. This assumption is not strictly required and the methodology can be generalized to more complex river structures, however this is outside the scope of the current paper.

4 Conclusions and impact

The methodologies described in this paper reduce the manual labor required to launch operational flood forecasting at new sites over large areas. They reduce reliance on difficult-to-attain and costly-to-measure data (e.g., discharge and riverbed bathymetry), and they reduce computation costs. These approaches do so largely while improving or at least not significantly harming accuracy relative to standard approaches. As such, these techniques offer an important step toward scaling flood warning systems that have the potential to cover billions of people.

In addition to developing and benchmarking these algorithms, we have also deployed them in dozens of real-time operational flood warning systems - proving they work well in a real-world operational setting. Over the past 5 months we have shown this work can achieve real impact by sending more than 36 million warnings and alerts to individuals at risk from floods, as well as the relevant authorities to help support relief and mitigation efforts. We continue to work towards scaling these systems up further.


  • K. J. "Beven ("2011") "Rainfall-runoff modelling: the primer". "John Wiley & Sons". Cited by: 1st item.
  • 5 (2011) World Bank Global Assessment Report on Costs and Benefits of Early Warning Systems. Note:[Online; accessed 30-09-2018] Cited by: §1.1.
  • N. Addor, A. J. Newman, N. Mizukami, and M. P. Clark (2017) The CAMELS data set: catchment attributes and meteorology for large-sample studies. Hydrology and Earth System Sciences 21 (10), pp. 5293–5313. External Links: Document, ISSN 1607-7938, Link Cited by: Appendix A, Appendix A.
  • Z. Ben-Haim, V. Anisimov, A. Yonas, V. Gulshan, Y. Shafi, S. Hoyer, and S. Nevo (2019) Inundation modeling in data scarce regions. arXiv preprint arXiv:1910.05006. Cited by: §3.
  • U. CRED (2015) The human cost of weather-related disasters, 1995–2015. United Nations, Geneva. Cited by: §1.1.
  • J. A. Downing, J. J. Cole, C. Duarte, J. J. Middelburg, J. M. Melack, Y. T. Prairie, P. Kortelainen, R. G. Striegl, W. H. McDowell, and L. J. Tranvik (2012) Global abundance and size distribution of streams and rivers. Inland waters 2 (4), pp. 229–236. Cited by: 2nd item.
  • M. Gauch, J. Mai, and J. Lin (2020) The proper care and feeding of camels: how limited training data affects streamflow prediction. Environmental Modelling & Software, pp. 104926. Cited by: §2.2.
  • F. A. Hirpa, K. Fagbemi, E. Afiesimam, H. Shuaib, and P. Salamon (2016) Saving lives: ensemble-based early warnings in developing nations. Handbook of Hydrometeorological Ensemble Forecasting, pp. 1–22. Cited by: §1.2.
  • J. Jian, D. Ryu, J. F. Costelloe, and C. Su (2017) Towards hydrological model calibration using river level measurements. Journal of Hydrology: Regional Studies 10, pp. 95–109. Cited by: §2.2.
  • S. N. Jonkman (2005) Global perspectives on loss of human life caused by floods. Natural hazards 34 (2), pp. 151–175. Cited by: §1.3.
  • F. Kratzert, D. Klotz, M. Herrnegger, A. K. Sampson, S. Hochreiter, and G. S. Nearing (2019a) Toward improved predictions in ungauged basins: exploiting the power of machine learning. Water Resources Research 55 (12), pp. 11344–11354. Cited by: 1st item, §2.2.
  • F. Kratzert, D. Klotz, G. Shalev, G. Klambauer, S. Hochreiter, and G. Nearing (2019b) Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets.. Hydrology & Earth System Sciences 23 (12). Cited by: Appendix A, Appendix A, 1st item, §2.2.
  • Z. Moshe, A. Metzger, G. Elidan, F. Kratzert, S. Nevo, and R. El-Yaniv (2020) HydroNets: leveraging river structure for hydrologic modeling. arXiv preprint arXiv:2007.00595. Cited by: §2.
  • A. J. Newman, M. P. Clark, K. Sampson, A. Wood, L. E. Hay, A. Bock, R. J. Viger, D. Blodgett, L. Brekke, J. R. Arnold, T. Hopson, and Q. Duan (2015) Development of a large-sample watershed-scale hydrometeorological data set for the contiguous usa: data set characteristics and assessment of regional variability in hydrologic model performance. Hydrology and Earth System Sciences 19 (1), pp. 209–223. External Links: Link, Document Cited by: Appendix A.
  • [15] (2020) Using technology to save lives during India’s monsoon season. Note:[Online; accessed 9-10-2020] Cited by: §1.1.

Appendix A Water level rainfall-runoff model experiments

Experiments from Section 2 used the benchmarking protocol developed by [12]. Specifically, we used the Catchment Attributes and Meteorological Large Sample (CAMELS) data set curated by the US National Center for Atmospheric Research [3, 14]. This data set consists of 671 catchments in CONUS ranging in size from 4 to 25,000 that have largely natural flows and long streamflow gauge records (1980-2010). We used only 499 of the CAMELS catchments - specifically, the basins with sub-daily gauge data available from the US Geological Survey.

Inputs into the LSTM models included time series of meteorological variables from NASA’s North American Land Data Assimilation System (precipitation, long- and short-wave radiation, wind speed, potential energy and potential evaporation, specific humidity, air temperature, and near-surface atmospheric pressure). Inputs also included static catchment attributes related to soils, climate, vegetation, topography, and geology [3]. A full list of catchment attributes used as model inputs can be found in Table 1 by [12].

Daily streamflow records (water level and/or discharge) were used as training targets with a normalized squared-error loss function that does not depend on basin-specific mean:


This loss function means that larger basins, which tend to have more discharge, are not over-represented in the effect on weight updates during training. The multi-target loss function was an equally-weighted average of this loss function over both variables (stage and discharge).

The training period was from October 1, 1999 to September 30, 2008 and the testing period was from October 1, 1989 through September 1, 1999. The LSTM used sequence-to-one prediction with a 365-day lookback, which increases minibatch diversity and the number of weight updates per epoch relative to sequence-to-sequence training.

Appendix B Morphological model visualizations

This appendix provides several visuals that help make the morphological model more intuitive.

Figure 2 illustrates a simple example of a river being flattened. The river, represented in three dimensions at the top of the image, is flattened by subtracting the water level at each point from the elevation map at that point. Regardless of the original structure, the resulting river will always be completely flat.

Figure 2: A simple example of a river being flattened so that the river profile is completely level.

Figure 3 shows how we associate pixels of the floodplain to points along the river profile, expressed as a color from white to blue. On the left we see the result if we simply associate each floodplain pixel with the closest river pixel, which leads to significant discontinuities which are not physically sensible. On the right we apply smoothing to the association, leading to a reasonable approximation of the accurate association.

Figure 3: Pixels in the floodplain being associated with points along the river profile, on the left without smoothing and on the right with smoothing.

Figure 4 shows an example of learned river profiles in the Brahmaputra river in India, showing the water level at each point (up to 50 meters) as a function of both the location of the point along the river (up to 10,000 meters) and the gauge measurement at the time (up to 30 meters).

Figure 4: Learned river profiles, represented as water level as a function of both the location along the river and the gauge measurement. The x axis refers to 27 increasing gauge measurements. The y axis represents the distance in meters from a reference point in the river in steps of 16 meters along the river curve. The z axis represents the water level in meters at the specific point in the river and gauge measurement.