Towards Global Remote Discharge Estimation: Using the Few to Estimate The Many

by   Yotam Gigi, et al.

Learning hydrologic models for accurate riverine flood prediction at scale is a challenge of great importance. One of the key difficulties is the need to rely on in-situ river discharge measurements, which can be quite scarce and unreliable, particularly in regions where floods cause the most damage every year. Accordingly, in this work we tackle the problem of river discharge estimation at different river locations. A core characteristic of the data at hand (e.g. satellite measurements) is that we have few measurements for many locations, all sharing the same physics that underlie the water discharge. We capture this scenario in a simple but powerful common mechanism regression (CMR) model with a local component as well as a shared one which captures the global discharge mechanism. The resulting learning objective is non-convex, but we show that we can find its global optimum by leveraging the power of joining local measurements across sites. In particular, using a spectral initialization with provable near-optimal accuracy, we can find the optimum using standard descent methods. We demonstrate the efficacy of our approach for the problem of discharge estimation using simulations.



There are no comments yet.


page 4


Spectral Algorithm for Low-rank Multitask Regression

Multitask learning, i.e. taking advantage of the relatedness of individu...

Global Optimality of Local Search for Low Rank Matrix Recovery

We show that there are no spurious local minima in the non-convex factor...

The basins of attraction of the global minimizers of the non-convex sparse spikes estimation problem

The sparse spike estimation problem consists in estimating a number of o...

Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data

Due to the increasing integration of solar power into the electrical gri...

Towards Understanding the Importance of Shortcut Connections in Residual Networks

Residual Network (ResNet) is undoubtedly a milestone in deep learning. R...

Bird-Area Water-Bodies Dataset (BAWD) and Predictive AI Model for Avian Botulism Outbreak (AVI-BoT)

Avian botulism caused by a bacterium, Clostridium botulinum, causes a pa...

Hierarchical Spatial Modeling of Monotone West Antarctic Snow Density Curves

Snow density estimates below the surface, used with airplane-acquired ic...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Floods are the most common and deadly natural disaster in the world. Every year, floods cause between thousands to tens of thousands of fatalities cred ; jonkman2003loss ; unisdr ; jonkman2005global ; doocy2013human , affect hundreds of millions of people doocy2013human ; jonkman2005global ; unisdr , and cause tens of billions of dollars in economic damages cred ; unisdr . Sadly, these numbers have only been increasing in recent decades loster1999flood . Indeed, the UN charter notes floods to be one of the key motivators for the formulation of the sustainable development goals (SDGs), and directly challenges us: "They knew that earthquakes and floods were inevitable, but that the high death tolls were not" undp .

Early warning systems, even with limited lead time and imperfect accuracy, have been shown to reduce both fatalities and economic damages by more than a third, and in some cases almost by half who ; pilon1998guidelines ; worldbank . Unfortunately, the majority of human costs that are due to flooding are concentrated in developing countries doocy2013human , which often lack effective and actionable early warning systems due to limited data collection, funding, or professional expertise stromberg2007natural . The result is that, across multiple countries, thousands die on average every year, and relief and mitigation efforts have very limited information to rely on.

In this work, as part of our broader efforts in flood forecasting nevo2018ml , we focus on riverine floods which are responsible for much of the effect on human life. Existing hydrologic methodology for building flood prediction models relies heavily on in-situ infrastructure such as costly extensive gauging systems worldbank , and on local adaption of the models that requires highly trained professionals anderson2002calibration . Providing value where it matters most thus requires overcoming several challenges. First, we would like to reduce reliance on in-situ measurements such as extensive gauging sites constructed along the modeled river. Relevant data is constantly being produced at immense scale across the globe, but the vast majority of this data is not measured using in-situ measurements but rather comes in the form of, e.g., satellite imagery. Clearly, leveraging even small parts of it has the potential for substantially improving flood prediction models. Second, to cover large areas in developing regions, we must automate and scale up the model building methodology and reduce its reliance on the human factor. Third, it is the sad paradox of life that populations in low-means areas cannot afford to respond to a low precision system, and thus to make positive impact in such areas, we require improved predictive power.

The field of machine learning (ML) has transformed many aspects of our lives, and is naturally geared to cope with the above challenges. Improving prediction, leveraging on multiple signals that are difficult for a human expert to get a grasp on, and automating human processes, are all characteristics of effective ML systems. The first critical step toward building such systems is to provide global-scale estimates of the water discharge (volume per second) through the cross sections of a river, which can then be used to train early warning predictive models. As noted, such in-situ measurements are unavailable more often than not, and thus our first goal is to perform remote discharge estimation, or estimating the discharge based on remote measurements (usually satellite data)

smith1997satellite .

Our concrete goal is thus to create a prediction model that, using few measurements from a set of river locations, will be able to generalize to all

locations. Intuitively, this should be possible because the multiple prediction problems (one for each location) are related: the underlying physical mechanism that relates satellite measurements to water level is identical, and each local measurement, where it exists, gives us a "clue" as to the nature of this shared mechanism. This general setting of leveraging information about some tasks to assist in the learning of models for other tasks has a long history in machine learning: inductive transfer, transfer learning and multitask learning are all closely related variants of the framework (see, e.g.

Thrun:1996 ; InductiveTransfer:1997 ; Caruana:1997 ; Baxter:2000 for some of the early influential works and Pan:2010 for a more recent survey).

We consider a simple but powerful regression model where the coefficients are composed of two components: one local that allow us to adapt to the characteristics of the local site, and one shared that allows us to capture the global water discharge mechanism. It is this shared component that can benefit from transfer learning. A similar formal setting is explored in the highly cited work of ando2005framework where the task is called structural learning111The focus of the work on transferring from unlabeled to labeled tasks is different from ours but the formal underpinning is identical., pointing to the common shared structure learned. As they show, using the empirical risk minimization (ERM) principle, it is provably beneficial to learn from multiple tasks, from a statistical sample complexity perspective. A recent work yuan2017spectral also shows empirically that this shared regression approach can be useful for multispectral imagery classification. The computational and optimization questions of "Can we efficiently learn such a model?" are left unanswered. In this work, we show that the answer to this question, at least from an optimization viewpoint, is in the affirmative.

The target objective of this common mechanism regression (CMR) is non-convex and may have spurious local minima. Our main contribution is that, given enough independent tasks, we can efficiently find its global optimum. For this purpose, we extend the ideas in netrapalli2013phase ; candes2015phase to CMR with multiple regressions. We begin with a spectral initialization with provable near-optimal accuracy, and then refine it using standard descent methods.

In the context of remote discharge estimation, our learning goal is to capture the common discharge mechanism that relates satellite measurements from multiple spectral bands to water levels. Naturally, we do not have access to the true mechanism (or we would not need to learn it). However, we can simulate such mechanisms and assess the merit of our approach when the ground truth is known. Using such simulations, we demonstrate the effectiveness of using our approach for transfer learning: sharing measurements from individual sites allow us to jointly improve the average predictive performance across all of them.

2 Common Mechanism Regression (CMR)

Our model consists of independent regressions that share a common mechanism. For simplicity, we assume that each regression has exactly pairs of labels and features


where are scalar labels, and are matrix observations.222We use for bands and for pixels in the context of discharge estimation but the setting is general. Our common mechanism regression (CMR) involves a two phase approach: a common mechanism parameterized by

followed by decoupled local linear regressions denoted by



Note that the overall structure is linear in the features, but has a bilinear parameterization. Our main goal is to recover the common parameter and, if possible, we would also like to identify the local ’s. In particular, we are interested in the scenario when is large but is small, so that we have many regression problems but few observations for each one. Each regression, if estimated independently, requires at least samples. By introducing a common mechanism where is shared across the different sites, we allow , and also address the case of where exact recovery of is impossible.

The CMR model is natural for river discharge estimation using remote sensing. Specifically, in multispectral imaging, the data matrices are defined by spectral and spatial dimensions. A reasonable approach to discharge estimation is thus to use the spectral information to identify water pixels and then apply spatial regression. The classical technique for water identification is via a common non-linear spectral feature known as Normalized Difference Water Index (NDWI) mcfeeters1996use 333More advanced indices are reviewed in isikdogan2017surface .. This index is the motivation to CMR which automatically learns a data-driven feature defined by the weights of . In what follows, we will show that linear CMR outperforms the non-linear NDWI.

We propose to recover the parameters as the solution to the following regularized bilinear least squares optimization:


Due to its bi-linear structure, CMR involves a non-convex minimization. Naive descent techniques may therefore converge to spurious local minima. Interestingly, CMR is similar to phase retrieval problems where it was recently shown that these bad critical points can be avoided via clever initialization schemes netrapalli2013phase ; candes2015phase . Adaptation of these ideas to CMR leads to the following common spectral initialization:


is the eigenvector corresponding to the largest eigenvalue. From here, we continue with standard descent methods, e.g., gradient descent or alternating least squares, till convergence. Together, the computational complexity of this approach is linear in


Under standard assumptions, the proposed spectral initialization can recover the true with high accuracy. Like hardt2016identity , we consider the realizable case, with normal features and assume an exact CMR model with no noise. We also assume random local regressors, i.e., we model

as i.i.d. realizations of an arbitrary probability distribution. This last assumption is special for our work and is required in order to model multiple regression problems with common characteristics.

Theorem Under the above assumptions, there exists a constant such that if , then with probability of at least .

The theorem quantifies the improved performance when increasing or via their product. To prove the theorem, we show that


where and are positive constants that depend on the distribution of

. Thus, its principal eigenvector is the true parameter. Using the fact that the variance of

decays with , we show that concentrates around its mean as and increase.

3 Numerical experiments

We start by assessing the merit of our CMR approach for discharge estimation using synthetic simulations. Recall that our goal is to leverage measurements from many locations to improve prediction. Thus, we consider the performance of CMR for a range of values of (the number of sites) and (the number of samples per site). For each set of values, we repeat the following 50 times: chose a random and , run the CMR algorithm, and declare success if the squared correlation between the true and its estimate exceeds . We do this with and without the spectral initialization. The results for and are summarized in the figure below.

Figure 1: Recovery of the true shared mechanism using the CMR model as a function of the number of sites (y-axis) and the number of samples per site (x-axis) without (left) and with (right) spectral initialization. The color of each square corresponds to the fraction of successful recoveries.

As expected, the results demonstrate that CMR recovers with few samples for many sites, i.e., when . Interestingly, we also succeed in recovering when , a setting where it is impossible to recover . The left and right panels illustrate the importance of the initialization, which substantially widens the ranges of settings for which CMR succeeds with high probability.

We now evaluate the merit of our CMR approach for the predictive task of discharge estimation in a real-world setting. We use images from LANDSAT8 mission roy2014landsat which include 11 spectral bands each, and ground truth labels from the United States Geological Survey (USGS) website. The results were generated using river gauge sites with temporal samples each. For every cross validation fold, the temporal samples were split into train and test, and the CMR results were compared with the NDWI per-site regression. The average mean squared errors, normalized per-site, of randomly shuffled -fold cross validation repeated 4 times are given in the following table:

Train Test
NDWI 0.54 0.70
CMR 0.47 0.65

As can be seen, there is a clear advantage to learning the shared component of the CMR model. Appealingly, the advantage is also substantial on held out test data, despite the expressiveness of the CMR model which also allows for local components.

4 Summary and Future Directions

In this work, we proved that, despite the non-convex nature of the learning objective, the common mechanism regression (CMR) model can be globally optimized using a spectral initialization combined with standard descent. We also demonstrated the efficacy of the approach for the challenge of discharge estimation where we have few measurements for many river sites.

On the modeling front, it would be useful to generalize CMR so as to allow for robust and task-normalized loss functions. Another interesting direction is to inject non-linearity into CMR to make it even more competitive with the non-linear physically motivated NDWI approach. On the practical discharge estimation front, we plan to aggregate multiple data sources (e.g. additional types of satellites, weather data) within the CMR framework.


  • (1) The Centre for Research on the Epidemology of Disasters (CRED) - Natural Disasters 2017., 2017. [Online; accessed 30-09-2018].
  • (2) United Nations Development Programme (UNDP) - Sustainable Development Goals., 2015. [Online; accessed 30-09-2018].
  • (3) United Nations Office for Disaster Risk Reduction (UNISDR) - The Human Cost of Weather Related Disasters., 2015. [Online; accessed 30-09-2018].
  • (4) World Health Organization (WHO) - Global Report on Drowning., 2014. [Online; accessed 30-09-2018].
  • (5) World Bank - Global Assessment Report on Costs and Benefits of Early Warning Systems., 2011. [Online; accessed 30-09-2018].
  • (6) Eric A Anderson. Calibration of conceptual hydrologic models for use in river forecasting. Office of Hydrologic Development, US National Weather Service, Silver Spring, MD, 2002.
  • (7) Rie Kubota Ando and Tong Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6(Nov):1817–1853, 2005.
  • (8) Jonathan Baxter. A model of inductive bias learning. J. Artif. Int. Res., 12(1):149–198, 2000.
  • (9) Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015.
  • (10) Rich Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.
  • (11) Thomas G. Dietterich, Lorien Pratt, and Sebastian Thrun. Special issue on inductive transfer. Machine Learning, 28(1), 1997.
  • (12) Shannon Doocy, Amy Daniels, Catherine Packer, Anna Dick, and Thomas D Kirsch. The human impact of earthquakes: a historical review of events 1980-2009 and systematic literature review. PLoS currents, 5, 2013.
  • (13) Moritz Hardt and Tengyu Ma. Identity matters in deep learning. arXiv preprint arXiv:1611.04231, 2016.
  • (14) Furkan Isikdogan, Alan C Bovik, and Paola Passalacqua.

    Surface water mapping by deep learning.

    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(11):4909–4918, 2017.
  • (15) Sebastiaan N Jonkman. Global perspectives on loss of human life caused by floods. Natural hazards, 34(2):151–175, 2005.
  • (16) SN Jonkman. Loss of life caused by floods: an overview of mortality statistics for worldwide floods. DC1-233-6, 2003.
  • (17) Thomas Loster. Flood trends and global change. In Proceedings IIASA Conf on Global Change and Catastrophe Management: Flood Risks in Europe, 1999.
  • (18) Stuart K McFeeters. The use of the normalized difference water index (ndwi) in the delineation of open water features. International journal of remote sensing, 17(7):1425–1432, 1996.
  • (19) Praneeth Netrapalli, Prateek Jain, and Sujay Sanghavi. Phase retrieval using alternating minimization. In Advances in Neural Information Processing Systems, pages 2796–2804, 2013.
  • (20) Sella Nevo, Vova Anisimov, Gal Elidan, Ran El-Yaniv, Pete Giencke, Yotam Gigi, Avinatan Hassidim, Zach Moshe, More Schlesinger, Guy Shalev, Ajai Tirumali, Ami Weisel, Oleg Zlydenko, and Yossi Matias. ML for flood forecasting at scale. In Proceedings of the NeurIPS AI for Social Good Workshop, 2018.
  • (21) Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 2010.
  • (22) Paul J Pilon et al. Guidelines for reducing flood losses. In Guidelines for reducing flood losses. Naciones Unidas, 1998.
  • (23) David P Roy, MA Wulder, Thomas R Loveland, CE Woodcock, RG Allen, MC Anderson, D Helder, JR Irons, DM Johnson, R Kennedy, et al. Landsat-8: Science and product vision for terrestrial global change research. Remote sensing of Environment, 145:154–172, 2014.
  • (24) Laurence C Smith. Satellite remote sensing of river inundation area, stage, and discharge: A review. Hydrological processes, 11(10):1427–1439, 1997.
  • (25) David Strömberg. Natural disasters, economic development, and humanitarian aid. Journal of Economic perspectives, 21(3):199–222, 2007.
  • (26) Sebastian Thrun. Is learning the n-th thing any easier than learning the first? In Advances in Neural Information Processing Systems, pages 640–646, 1996.
  • (27) Haoliang Yuan and Yuan Yan Tang. Spectral–spatial shared linear regression for hyperspectral image classification. IEEE transactions on cybernetics, 47(4):934–945, 2017.