Reconstructing nodal pressures in water distribution systems with graph neural networks

04/28/2021 ∙ by Gergely Hajgató, et al. ∙ Budapest University of Technology and Economics 0

Knowing the pressure at all times in each node of a water distribution system (WDS) facilitates safe and efficient operation. Yet, complete measurement data cannot be collected due to the limited number of instruments in a real-life WDS. The data-driven methodology of reconstructing all the nodal pressures by observing only a limited number of nodes is presented in the paper. The reconstruction method is based on K-localized spectral graph filters, wherewith graph convolution on water networks is possible. The effect of the number of layers, layer depth and the degree of the Chebyshev-polynomial applied in the kernel is discussed taking into account the peculiarities of the application. In addition, a weighting method is shown, wherewith information on friction loss can be embed into the spectral graph filters through the adjacency matrix. The performance of the proposed model is presented on 3 WDSs at different number of nodes observed compared to the total number of nodes. The weighted connections prove no benefit over the binary connections, but the proposed model reconstructs the nodal pressure with at most 5 at an observation ratio of 5 graph neural networks by following the considerations discussed in the paper.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


GitHub repository for "Reconstructing partially observed nodal pressure with graph neural networks in waterdistribution systems" paper

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Gathering real-time data of hydrodynamic properties is an essential task for water distribution system (WDS) operators as data acquisition is indispensable for digital twins to ensure safe and efficient operation. Beside its importance, collecting and processing data from a WDS is difficult in two respects: the data lie on an irregular grid and they are partially observable. A method is introduced from the field of graph signal processing and machine learning to overcome these difficulties. The pressure can be obtained in every node and in real-time from a moderately instrumented WDS with the proposed approach.

Digital twins and problem-tailored numerical models for WDS management is an actively researched area with a vivid research community. Measurement-dependent models are successful in optimal pump scheduling (Candelieri et al., 2020; Bagloee et al., 2018; Odan et al., 2014; Bhattacharya et al., 2003), in water demand forecast (Farias et al., 2018; Herrera et al., 2010) and in attack and leakage detection (Quiñones-Grueiro et al., 2019; Taormina and Galelli, 2018; Taormina et al., 2018; Arsene et al., 2012; Deng et al., 2011), among others. Ongoing efforts on developing digital twins for water networks are summarized by Fuertes et al. (2020). Beside their benefits, the efficacy of these models depends on the quality and quantity of live measurement data, mostly nodal pressures in the first place.

Extensive online sensing got attention from both the WDS operators (Whittle et al., 2011) and the sensor manufacturers, side as well (Nardo et al., 2018), in recent years. These efforts in conjunction with the available numerical models led to the birth of the smart water grid (smart water network, smart urban water network) paradigm (Lee et al., 2014). Despite the ongoing trends, the high number of pipes and junctions in a real-life WDS (Sitzenfrei et al., 2013; Yazdani and Jeffrey, 2011) makes full observation of a hydraulic variable over the whole pipe network infeasible (Klise et al., 2013). This induces the need for methods that can complete partially observed graph signals.

Tanaka (2018) presents efficient techniques for graph signal upsampling to ensure the inheritance of the original signal characteristics both in the vertex and spectral domain. Despite its benefits, this approach cannot be used in the case of nodal pressure completion as sensor deployment is infeasible in such an organized manner (e.g. into every node).

Xie et al. (2017) developed an algorithm to optimally distribute the sensors over WDSs by minimizing the mutual coherence and focusing on leak detection. The method achieved a minimal average mutual coherence of around on the models of real world water networks in steady-state scenarios and with only a handful of observed nodes. Wei et al. (2020)

introduced a novel graph Fourier-transform methodology to define sensor placement for observing transient phenomena, namely pollutant spread. The time-dependent signal is reconstructed perfectly by observing

of the nodes. These studies focus on how to deploy sensors (or how to use them economically, if there are many) from which the method of reconstruction follows. However, the success of optimal sensor-placement in case of old water networks with uncertainties in their numerical models is uncertain on its own. Thus, in the case of many existing WDSs, attention should be given to a robust reconstruction method that performs well with suboptimal sensor placement.

All the nodal pressures can be gathered from hydraulic simulations as well. Hydraulic solvers for WDS modeling depend on the nodal demands as a boundary condition for the simulation, therefore the nodal demands must be measured or predicted in order to get nodal pressures. Measuring the volumetric flow rate and feed it into a hydraulic solver is disadvantageous in two ways: the instrumentation is more complicated than for pressure measurement and the simulation cannot be performed in real-time. The latter problem can be solved by utilizing demand forecasting models and calculate nodal pressures beforehand. Novel methods of water consumption forecasting (Du et al., 2021; Bata et al., 2020; Farias et al., 2018) make it possible, but nodal pressures become false if an unpredicted event occurs in the water network. The main difficulty of the approach is that the predictions are made on the full demand of the water network, not on the consumption of the individuals. Hence, the prediction should be distributed among the nodes by some kind of rule or rules to make it an appropriate boundary condition for the hydraulic simulation.

A data-driven approach for nodal pressure reconstruction in real-time is presented by Lima et al. (2017)

, who developed a multilayer perceptron (MLP) to complete partial nodal pressure observations, showing good agreement with the ground truth.

Lima et al. (2017)

used simulated data for the training and testing in the study, involving the numerical model of both of the benchmarked WDSs. The nodal demands were the boundary conditions of the hydraulic models and their variation over a day was handled by changing all of the demands from scene-to-scene with a common multiplier. Nodal demands (water outflow at the consumers) do not necessarily change at once and with the same ratio in a WDS, meaning that this has been a significant simplification of the problem. The proposed MLP with one hidden layer of 10 neurons achieves an average reconstruction error well below

on two WDSs with having approx. nodes in each.

Although the example of a shallow MLP succeeded in the above simplified nodal pressure completion problem, using densely connected layers for larger water networks with more realistic nodal demand variation could be inefficient. Recent advances of graph signal processing (Shuman et al., 2012) led to various formulations of the convolution operator on graphs, making the learning efficient on irregular domains. An in-depth survey on graph neural networks (GNNs) is given by Wu et al. (2021) and a benchmark of the popular methods is researched by Dwivedi et al. (2020). Details are not given here due to the high number of graph neural network variants but the selected method with the motivation for the choice will be discussed in Sec. 2.2.

The methodology of using graph convolutional layers with K-localized spectral filtering (Defferrard et al., 2016) are introduced in this paper to reconstruct partially observed nodal pressures in water distribution systems. The overview of the method is depicted in Fig. 1. The novel contributions of the paper are the following:

  • a specific type of graph neural network is proposed according to the peculiarities of the graphs of water networks;

  • graph neural networks are trained and evaluated on three benchmark WDSs to confirm the approach with a dataset of realistic nodal pressure variations;

  • a study on the hyperparameters’ influence is elaborated from the viewpoint of nodal pressure reconstruction.

The term partial observation is used in this paper for situations, wherein a signal can be observed only in a subset of nodes of a graph. In the present study, the signals will be the nodal pressures of a WDS and the graph will be the underlying topology of pipes and other hydraulic components.

Figure 1: Overview of the proposed method for nodal pressure reconstruction.

2 Nodal pressure reconstruction with graph neural networks

2.1 Modeling the hydraulics

Nodal pressure reconstruction is related to fluid flow in pipe networks. In order to find a suitable method for data-driven nodal pressure reconstruction, taking a look on the underlying physics and mathematical representation beforehand can reveal some valuable observations.

The pipe network of a water distribution system can be represented as a connected, undirected graph. The nodes of the graph are node-like hydraulic elements as pipe joints, consuming points, tanks, etc. The edges of the graph are edge-like hydraulic elements as e.g. pipes, pumps and valves. Such hydraulic topologies are depicted in Fig. 3.

The governing equations of fluid flow in such a pipe network are described by the continuity equation and the energy equation. The law of continuity applied on the -th node is


where is the number of edges ending in the -th node, is the density of the fluid, is the volumetric flow rate and is the nodal consumption as mass flow rate. The energy equation with a loss-term applied on the -th edge is


where and are the nodal pressures in the ends of the edge, is the volumetric flow rate and , , and are edge-dependent factors and constants.

Nodal pressure is considered to be a valuable information, hence Eq. 2 describes the way, how information propagates in the water network. Considering a pipe from a real world WDS, the following assumptions can be taken.

  • . Assuming high Reynolds number flow, this term is negligible compared to quadratic loss term.

  • assuming constant-diameter pipes, as the quadratic term is the difference between the dynamic pressures at the ends of the th pipe.

comes from the geodetic height difference, hence, it is a constant term. describes the wall friction and separation losses in the th pipe or other hydraulic component that is proportional to the square of the volumetric flow rate. This term has a sign as it reduces the total pressure in the direction of the fluid flow.

This way, the energy equation with a loss-term can be simplified to


for a pipe of constant diameter which is the most common edge-like part in a water distribution system. The direction of the edges can be arbitrary as the direction together with the sign of the volumetric flow rate tells whether the fluid flows from node to node or vice-versa. Losses arising from other components and in the nodes are considered as part of the pipe-losses.

In order to get the hydraulic state of the water distribution system, both Eq. 1 and Eq. 2 have to be solved for all of the node-like and edge-like elements of the WDS, respectively. These equations form a system of nonlinear, algebraic equations, that can be linearized by taking some further assumptions. Knowing the boundary conditions (the nodal demands in practice), the system of the linearized equations can be solved by iterative numerical methods. The hydraulic model was presented here in order to get a rough view on the underlying physics, only the loss term will be examined further in Sec. 2.3. The exact equations that were solved by the hydraulic simulator can be found in the EPANET 2 users manual (Rossman, 2000).

2.2 Choosing the appropriate graph neural network

Regarding the high number of existing graph neural network types (Wu et al., 2021), several aspects were considered taken before selecting the appropriate one. The main points were the following.

  1. The topology of the WDS is static. It is true for most of the time when the WDS operates under normal conditions. It does not hold, however, in emergency cases, when a so-called segment of the WDS has to be shut off. Handling of such emergency scenarios are not considered in this paper.

  2. The input and output variables are nodal properties. This stands as the task is to predict the all nodal pressure from a subset of the nodal pressures. The only other hydraulic property (a value proportional to hydraulic loss) is embedded in the adjacency matrix in some cases and neglected in others. No quantity is inferred on the edges nor added to the input data.

  3. The nodal pressure depends on the pressures in the neighbouring nodes and the volumetric flow rates between the neighbours, regardless of the neighbours’ orientation. Thus it makes no sense to use graph convolutional filters with anisotropic properties, spectral filters should cope with the task.

  4. The connection strength between nodes can be derived from the hydraulic properties, because a factor that is proportional to hydraulic loss can be computed before the training if the friction loss coefficient in Eq. 3 is available.

Considering the above assumptions and properties, the K-localized spectral graph filtering method (referred later as ChebNet) was selected for the present study. The technique is introduced in Defferrard et al. (2016); only the core concept is described here.

As convolutional filters are efficient in learning data that are represented on a regular grid, Defferrard et al. (2016) formulated a similar filtering method for data represented on irregular grids. ChebNet builds upon the fact that the convolutional operator is a single multiplication in the spectral domain. Graph signals can be transformed to the spectral domain by the Laplace-transformation. The transformation matrix can be calculated purely from topological data: the degree matrix and the adjacency matrix . The multiplication of this transformation matrix with the graph signal leads to the spectrum of the signal in one step. This transformation is non-parametric and non-localized, which leaves no room for constructing convolutional filters. Defferrard et al. (2016) propose the use of the truncated Chebyshev-polynomial of the graph Laplacian instead. In this way, the range of information propagation can be controlled by the truncation. Besides, different ranges can be parametrized differently, which allows to learn localized filters on the graph.

In practice, the scaled and normalized form of the Laplacian is used instead of to ensure numerical stability:



is the maximum of the eigenvalues of the graph Laplacian,

the identity matrix,

the degree matrix and the adjacency matrix. The convolution of a weighted filter with a graph signal becomes


where the Chebyshev-polynomials are of the first kind, thus these are in the form


The learnable parameters are denoted by .

Knowing the underlying physics and the learning model, the effects of the hyperparameters can be examined.

2.3 Effects of ChebNet hyperparameters to nodal pressure reconstruction

2.3.1 Polynomial degree of the kernel

The Chebyshev-polynomial of the graph Laplacian is taken into account up to the -th order in Eq. 5. The parameter sets how far the information propagates in one layer. Several applications depend on information from the direct neighbourhood only, and this property was exploited by Kipf and Welling (2017) by truncating the Chebyshev-polynomial to the first degree and setting the localization by the number of consecutive layers. Despite that, such an intensive truncation would not be beneficial for the current task if the number of the observed nodes are to be kept low. From this perspective, the order of the polynomial determined by the demand that the observed information reaches all the unobserved nodes. This implies that – as long as the observation points are selected uniformly randomly in the graph – the polynomial order should be at least equal to the graph diameter.

2.3.2 Number of layers and channels

Adding multiple channels to a layer allows the network to learn different filters for the same reception field. It will be necessary in this study as the underlying physics cannot be described from the partial observations with a single formula.

The number of layers plays a similar role in the information propagation as the order of the polynomial degree and the number of channels together. The output of a ChebNet layer is constructed with a receptive field, characterized by the polynomial degree of the kernel as seen in Eq. 5. Stacking ChebNet layers consecutively, the information spreads not only due to powering the graph Laplacian but due to feeding the spread information to the input of the consecutive layers.

Sparse observation in large water networks would require a vast amount of hidden layers owing to the conclusion of the preceding section. This is disadvantageous because of the vanishing gradient problem

(Hochreiter et al., 2001) that can lead to the failure of the model training. Thus, the the expressive power required by the problem is ensured by the number of channels, while the number of layers is tried to be kept low in the present study.

2.3.3 Weighting of the graph Laplacian

(a) Binary connections
(b) Weighted connections
(c) Logarithmically weighted connections
Figure 2: Scaled and normalized graph Laplacian of Anytown calculated from the adjacency matrix with binary (a), weighted (b) and logarithmically weighted (c) connections.

Most of the original work on graph convolution networks use the binary adjacency matrix to compute the graph Laplacian. This is appropriate in many use cases where the connection strength between nodes is binary by its nature, e.g. in citation networks and knowledge graphs.

On the other hand, the connection between nodes in a water network can be considered weighted wherewith the learning could be preconditioned. The question arises whether the prediction accuracy can be improved with this information embedded into the spectral filtering.

Considering Eq. 3, the first term on the right-hand side is


by neglecting the direction of the flow. As we are not interested in the flows on the edges, this is a reasonable simplification in order to get a quantity that is proportional to the resistance of information propagation between nodes. For cylindrical pipes Eq. 7 becomes the Darcy–Weisbach equation:


where is the flow coefficient, the length of the pipe, the fluid density and the diameter of the pipe. Eq. 8 is untractable in many real-life situations as the flow coefficient is unknown. Hence, common practice use empirical formulas instead. Among these equations, the Hazen-Williams (H–W) formula is still favored despite its fairly inaccurate results. The limitations of the formula are discussed thoroughly by Liou (1998) and Christensen et al. (2000). Despite its drawbacks, the H–W formula is used in the present study due to the available data.

The energy loss of the fluid flowing through a pipe according to the H–W formula is


as per Rossman (2000). Head loss , pipe diameter and pipe length are in feet and the volumetric flow rate is in cubic feet per second. is the Hazen–Williams roughness coefficient that is the attribute of the pipe and is defined empirically. In practice, this coefficient is known for specific pipe materials while they are new. The value in case of an aged pipe is only a crude guess as the roughness coefficient is influenced by a high number of external factors that cannot be measured. Still, if the coefficient is known, the weight referring to how well the information (pressure) propagates in the water network can be derived from Eq. 9 as


This quantity is normalized by its maximum value calculated over the entire topology to avoid numerical problems if an extremely short or long pipe is presented in the WDS. A more precise weighting metric can be defined from Eq. 8 in cases, where the flow coefficient is tractable for all the pipes.

A key property of the scaled and normalized graph Laplacian calculated from the weighted adjacency matrix is that it spans a significantly wider range of values than the one calculated from the binary adjacency matrix. The scaled and normalized graph Laplacians of the smallest water network involved in this study are depicted in Fig. 2. These matrices were calculated from the binary, the weighted and the logarithmically weighted adjacency matrix, respectively. Regarding the figures, the self-connections lose their importance when the connections are weighted. This is impractical for completing partial observations as self-connections are important in the nodes wherein a sensor is installed. Moreover, self-connections help achieve better prediction accuracy in general, as shown by Wu et al. (2019).

To overcome this problem and maintain the weighted nature of the adjacency matrix, the logarithm base 10 of the original weights are also used to weight the interconnections. The connection represented by pumps and valves are weighted with connection strength in this case because all the valves in the numerical models were fully opened. If partially closed valves are present in a WDS, a weighting can be defined specifically for valves from the energy equation similarly to the weights of the pipes. In the case of pumps, a weight could be defined in a more elaborate way which is not discussed here as the number of pumps in a WDS is negligible compared to the number of pipes.

The scaled and normalized graph Laplacian according to the logarithmic weighting is depicted in Fig. 2. The result is similar to that calculated from the binary adjacency matrix but with the connection strength encoded.

The effect of weighting and logarithmically weighting the connections with the plain binary connections is examined in this study.

3 Experiments

3.1 Problem definition and dataset

The capability of graph neural networks to reconstruct partially observed graph signals were examined with numerical experiments. The application is the reconstruction of all the nodal pressures in water networks from a limited number of measurement data. The placement of the sensors is considered to be fixed, hence, a GNN is trained for a specific layout of the measurement devices. Moreover, the nodal pressures are considered to be varying slowly in time (pressure waves may not developed) but temporality is omitted from the reconstruction model. Instead, the model handles a scene at once, where a scene represents a steady state of the WDS according to the boundary conditions. Pressure data collected from specific nodes per scene is given to the input of the model and all the nodal pressures are yielded as the output for the same scene.

In lack of measurement data of real-world water networks, three benchmark WDSs were used for the numerical experiments. The dependence of the nodal pressures on the hydraulic state (nodal demands, pump speeds, etc.) of the WDS makes the hydraulic simulation necessary for every scene. The numerical model was available for each of the water networks wherewith a sufficient amount of different scenes were generated to train, validate and test the proposed model.

The water networks were namely the Anytown, C-Town and the Richmond networks. These WDSs are commonly used benchmark networks among the water resources management community and can be reached online from the web page of University of Exeter (1986). Major properties of the water networks can be found in Table 1 while the topologies of the networks are depicted in Fig. 3. Anytown and C-Town are fictitious villages, while Richmond is a town with around 8000 inhabitants in the United Kingdom. The size of the water networks is reflected in the computation demand of the hydraulic simulations, therefore Anytown is involved in the study to examine the assumptions taken in Sec. 2.3, while C-Town and Richmond represents the performance of the proposed model for real size WDSs. Data on larger water networks were not available for the study.

Water network # of junctions # of pipes Diameter
Table 1: Major properties of the water distribution systems
(a) Anytown
(b) C-Town
(c) Richmond
Figure 3: Topologies of the water distribution systems containing pipes, junctions, pumps and tanks.

Scene generation with varying consumer demands and pump speeds was conducted with the algorithm described by Hajgató et al. (2020). This method handles the day-night shift and the variation in the demands of the consumers in different phases thus a wide range of scenes can be generated that is simulating the variations in a real-life WDS. A brief overview of the method is as follows.

First, the nodal consumption is generated for each node based on the initial data from the numerical model. This initial value is an average for the consumption during the working hours. It is assumed that the ratio of the water consumption between small (e.g. residential homes) and large (e.g. factories) consumers vary moderately in time, thus the initial demand is multiplied with random factors from a truncated normal distribution with the lower and upper limit aligned to the specific WDS. The sum of the initial demands is rescaled thereafter to reflect the day-night shift in the water consumptions. This is done by a multiplication with a factor from a uniform distribution and the lower and upper limits are WDS-specific again. The nodal demands are then rescaled once more to satisfy the total consumption of the WDS. Finally, pump speeds are generated between the feasibility limits with a uniform distribution.

, and different scenarios were generated with the above algorithm for the Anytown, C-Town and Richmond WDS, respectively. The whole parameter space – incl. total consumption, nodal consumptions and pump speeds – were sampled with a design of experiments (DOE) method. Although benefits are reported on the use of DOE methods in neural network training (Mehta et al., 1999), in the present study, latin hypercube sampling was used only to ensure that no repetitions occur in the database.

The generated scenes were split into training, validation and test data sets as per a ratio of , and , respectively.

3.2 Model training and evaluation

Separate graph neural networks were built to signal reconstruction as per the considerations emphasized in Sec. 2.3. First, the GNN topology for the smallest WDS was defined by hyperparameter optimization discussed in Sec. 3.3, then the resulting topologies were defined arbitrarily. This compromise was taken due to limited resource availability.

The input and output was built from the previously generated dataset as follows. Metrics were calculated from the training dataset for the standardization and normalization of the nodal pressures for feeding to the input and to the output of the GNNs, respectively. Installing the sensors in the WDS was simulated by generating a binary mask with the same length as the number of nodes in the WDS. Binary masks were generated randomly in the beginning of each training, according to a probability called

observation ratio. The number of installed pressure sensors are defined by this ratio compared to the number of all nodes in the WDS.

The GNNs received an input with two channels: one channel for the partially observed signal (the full signal multiplied with the mask) and one channel for the mask itself. The full signal was represented on the output of the GNN.

The training was carried out times for every water network. Five different observation ratios were examined, namely: . trainings were carried out with each observation ratio with different binary masks each to simulate the effect of different sensor placements.

The reconstruction capability of the model is measured by the mean-squared error between the model output and the ground truth during the training. This loss is calculated for each node regardless whether it is an observed or an unobserved one, because preliminary studies showed better results with incorporating the observed nodes in the backpropagation of error. The loss metric was measured on the training data and on the validation data at the end of every epoch, while it was measured only once on the test dataset at the end of every training.

3.3 Hyperparameter optimization and network topologies

The considerations presented in Sec. 2.3 served as an initial guess on the adequate neural network topology which was then modified via hyperparameter optimization in the case of the Anytown WDS. A fairly shallow artificial neural network was initialized exclusively with ChebNet layers and weight decay as regularization. Hidden layers utilized a sigmoid linear unit (SiLU) as activation (Hendrycks and Gimpel, 2016)

, while the output layer was activated by the sigmoid function to guarantee a finite-range feedback to the input signal. Layer weights were initialized by the Xavier initialization

(Glorot and Bengio, 2010), while biases were initialized to zero. As no experience has been gathered on the optimal learning rate of the problem, the Adam optimizer (Kingma and Ba, 2015) was used for training both of the neural networks as it has an adaptive learning rate with momentum.

The number of the hidden layers, the number of the polynomial degree of the kernel, the number of filters in a layer, the amount of regularization applied on each layer and the type of the adjacency matrix were subject to hyperparameter optimization as per Table 2.

The tree-structured parzen estimator (TPE)

(Bergstra et al., 2011) algorithm was applied to find the optimal hyperparameters of the GNN. TPE was allowed to do a search in the parameter space for evaluations and initial random trials were performed beforehand. As multiple stochastic processes are included in the training, TPE was set to evaluate a specific hyperparameter setting as the average result of trainings. The quality of a hyperparameter setting was measured by the validation error. The training of a GNN was shut down after epochs or earlier, if the validation error did not improve by in consecutive epochs. In most of the cases, the training was ended before it reached the upper limit of epochs, thus raising the limit was not considered.

Hyperparameter Possibilities
# of layers 2..4
weight decay 1e-6..1e-4
Laplace weighting {binary, weighted, logarithmic}
Table 2: Subjects of the hyperparameter optimization. : degree of the Chebyshev-polynomial at the th layer, : number of filters in the th layer.

The results of the hyperparameter optimization are shown in Section 4; only the final topologies of the GNNs are presented here. The topology for the Anytown WDS was the direct outcome of the hyperparameter optimization. The other two topologies were constructed arbitrarily but in concordance with the consequences of hyperparameter optimization. The resulting GNN topologies are summarized in Table 3.

Hyperparams. Anytown C-Town Richmond
Table 3: Graph neural network topologies per layer. : degree of the Chebyshev-polynomial at the th layer, : number of filters in the th layer.

4 Evaluation and results

Regarding the lessons learned from the results yielded by the various adjacency matrices, both the results of the hyperparameter optimization and the regular training are discussed in this section.

(a) Observation ratio: .
(b) Observation ratio: .
Figure 4: Violin plots of the model performances during the hyperparamater optimization at different observation ratios in the Anytown water network.

The hyperparameter optimization was carried out for the Anytown WDS with the same observation ratios as the regular trainings. As the results are similar in all cases, only the results of the lowest and the highest observation ratio are presented in Fig. 4. Validation error for all evaluations are depicted in these figures as swarms. These swarms are separated according to the type of the adjacency matrix used for the calculation of the graph Laplacian. Finally, violin plots are drawn in the background in order to give a better view on the distribution of the validation error according to a specific swarm. Each individual inside the swarms is colored by the number of layers used in the ANN. Other hyperparameters are not visualized in the figures.

The results prove the feasibility of the considerations of Sec. 2.3. The models with the weighted adjacency matrix perform worse than the models with the binary adjacency matrix. Besides, the models with the logarithmically weighted adjacency matrix perform similarly to those with the binary adjacency.

The number of the hidden layers are emphasized in the swarms to show that good performance can be achieved with shallow models. While at a small observation ratio – technically, observing only two nodes in the Anytown WDS – a deeper artificial neural network performs generally slightly better than a shallow one, the shallow one is superior at a high observation ratio. The performance of the 3-layered and the 4-layered models were similar over the different observation ratios that are not represented here.

For the above reasons, a 3-layered topology was selected as the base of the GNN for the other two WDSs. The adjacency matrix was set to binary as it performs best close to the logarithmically weighted type. More details for the reasons of this decision are given in Sec. 5.

After settling the topologies for every signal-reconstruction model, 20-20 trainings were carried out with sensors installed in different nodes, as described in Sec. 3.1

. The empirical cumulative distribution function of the relative error with the – arbitrarily chosen –

th trained model is depicted in Fig. 5 for the Richmond WDS. The data were calculated on the test set of scenes. The relative error is lower than in the of the nodes even with the smallest observation ratio and the performance gets better by raising the number of the observed nodes.

Figure 5: Empirical cumulative distribution function of the relative errors for the nodal pressure in all nodes. Data were calculated on the test scenes of the Richmond water network (containing nodes) with the th sensor placement (out of ) with different observation ratios.

The rest of the cumulative distribution plots are not presented here due to the high number () of trained models. An aggregated evaluation is depicted instead in Taylor diagrams (Taylor, 2001) for the Anytown, C-Town and Richmond WDSs in Fig. 6

(a), (b) and (c), respectively. The test scenes were handled as snapshots of a time-varying field variable (pressure) and their standard deviation and correlation coefficient compared to the ground truth were calculated. Three type of metrics are shown in Fig. 


  1. the normalized standard deviation as the radial coordinate,

  2. the correlation coefficient as the angular coordinate,

  3. the centered root mean square error as contours.

The standard deviation is normalized by the standard deviation of the ground truth pressure field. The performance of the models with different sensor placements are averaged per observation ratio to avoid cluttering.

If a model could perfectly reconstruct the nodal pressures in the WDS, it would achieve unit normalized standard deviation and unit correlation coefficient. As a consequence, its centered root mean square error would be zero. This place is denoted with a red cross in each figure. The metrics are computed on the full set of points in the graph, even on the observed ones, thus the points are lying on a half-circular arc with smaller observation ratios tending towards the center of the coordinate system.

To give a hint on the usefulness of the proposed model, a naïve model is used as a baseline. The naïve model predicts the same pressure for each of the unobserved nodes in the WDS: the average of the observed pressures. This is used as a sanity check; the proposed model should outperform this model.

The naïve model is denoted with rectangles while the proposed graph convolution on water networks (GraphConvWat) is denoted with circles. Different observation ratios are colored differently, but same observation ratios for the two different models are colored the same.

The GraphConvWat clearly outperforms the naïve model. While both models’ performance gets better by raising the observation ratio, the proposed technique achieves a decent performance even with the least observations. The worst performance is seen in the Anytown water network (Fig. 6(a)) which has the least nodes among the WDSs. An observation ratio of means the observation of nodes only, which explains the sub-average performance. GraphConvWat performs better on the other two WDSs, where the centered root mean square error is below for each observation ratio. The main reason of the deviation from the reference (denoted by a red cross in Fig. 6) is revealed by the empirical cumulative distribution plot in Fig. 5. While the relative error for the pressure reconstruction is low for most of the nodes (below ), there are a few nodes with a reconstruction error as high as .

(a) Anytown
(b) C-Town
(c) Richmond
Figure 6: Taylor diagram of the reconstruction performance achieved by the naïve and by the proposed graph convolution on water network (GraphConvWat) at different observation ratios (OR). The test scenes were handled as snapshots of a time-varying field variable. The perfect reconstruction is denoted by a red cross as a reference. The markers belonging to GraphConvWat represents the average metric of the model over different sensor placements. Radial coordinate: standard deviation of the reconstructed nodal pressures normalized with the standard deviation of the ground truth nodal pressures. Angular coordinate: correlation coefficient between the reconstructed and the ground truth nodal pressures. Contours: centered root mean square error of the reconstruction, the lower the better.

5 Conclusions

A way to utilize graph convolution on water networks has been proposed in the present paper to reconstruct all the nodal pressures from partial observations in water distribution systems. Several considerations were set up in Sec. 2.3 on how to apply GNNs to this specific task. Numerical experiments were conducted to examine these considerations and due to the results, the key findings are the following.

Graph neural networks based on the K-localized spectral filtering introduced by Defferrard et al. (2016) are able to reconstruct the nodal pressures on three benchmark WDSs Anytown, C-Town and Richmond with a relative error at most on average with an observation ratio of at least . The reconstruction performance gets better by raising the observation ratio.

Besides, the performance is adequate even with shallow artificial neural network topologies if the number of filters per layer is kept high enough to model the underlying physics. The width of the receptive field can be set with the number of hidden layers and the degree of the Chebyshev-polynomials together, the results are satisfying by keeping their combination high enough to span at least the diameter of the graph.

In spite of the inviting possibility to weight the adjacency matrix with hydraulic properties of the WDS, the proposed model gave a similar (or better) reconstruction performance with the binary adjacency matrix utilized. This induces the use of the binary adjacency matrix over the weighted or the logarithmically weighted one for two reasons:

  1. the pipe roughness data are often uncertain and as the weighting from the perspective of the information-propagation between nodes is learned during the training, knowledge over the hydraulic weights gives no benefit to the process;

  2. in a well-instrumented WDS where the training data comes from measurement instead of hydraulic simulations, the corresponding information (pipe roughness) is encoded in the training data.

In this setting, the proposed model is able to reconstruct partially observed nodal pressure on WDSs, making it ideal for the management of WDSs, where the number of applicable instruments is limited. Beside the application in management, the proposed model can open the way for such controller techniques that depend on the full instrumentation of the WDS for optimal operation (Hajgató et al., 2020).


The research presented in this paper, carried out by BME was supported by the Ministry of Innovation and Technology NRDI Office within the framework of the Artificial Intelligence National Laboratory Program, by the NRDI Fund based on the charter of bolster issued by the NRDI Office under the auspices of the Ministry for Innovation and Technology. Bálint Gyires-Tóth is grateful for the financial support of Doctoral Research Scholarship of Ministry of Human Resources (ÚNKP-20-5-BME-210) in the scope of New National Excellence Program, and the János Bolyai Research Scholarship of the Hungarian Academy of Sciences.

Appendix A Software packages and program code

The graph neural networks were assembled in PyTorch Geometric

(Fey and Lenssen, 2019) based on the PyTorch (Paszke et al., 2019)deep learning package. The hyperparameter optimization was carried out with the Optuna optimization framework (Akiba et al., 2019). The ample number of hydraulic simulations were carried out with the EPANET solver (Rossman, 2000) coupled with EPYNET (Heinsbroek, 2016) to Python.

The program code for the study is available online (Hajgató et al., 2021).


  • T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama (2019) Optuna: a next-generation hyperparameter optimization framework. In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631. Cited by: Appendix A.
  • C. T.C. Arsene, B. Gabrys, and D. Al-Dabass (2012) Decision support system for water distribution systems based on neural networks and graphs theory for leakage detection. Expert Systems with Applications 39 (18), pp. 13214–13224. External Links: ISSN 0957-4174, Document Cited by: §1.
  • S. A. Bagloee, M. Asadi, and M. Patriksson (2018) Minimization of water pumps’ electricity usage: a hybrid approach of regression models with optimization. Expert Systems with Applications 107, pp. 222–242. External Links: ISSN 0957-4174, Document Cited by: §1.
  • M. Bata, R. Carriveau, and D. S.-K. Ting (2020) Short-term water demand forecasting using hybrid supervised and unsupervised machine learning model. Smart Water 5 (1). External Links: Document Cited by: §1.
  • J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl (2011) Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.), Vol. 24, pp. 2546–2554. Cited by: §3.3.
  • B. Bhattacharya, A. H. Lobbrecht, and D. P. Solomatine (2003)

    Neural networks and reinforcement learning in control of water systems

    Journal of Water Resources Planning and Management 129 (6), pp. 458–465. External Links: Document Cited by: §1.
  • A. Candelieri, B. Galuzzi, I. Giordani, and F. Archetti (2020) Learning optimal control of water distribution networks through sequential model-based optimization. In Learning and Intelligent Optimization, I. S. Kotsireas and P. M. Pardalos (Eds.), Cham, pp. 303–315. External Links: ISBN 978-3-030-53552-0 Cited by: §1.
  • B. A. Christensen, F. A. Locher, and P. K. Swamee (2000) Limitations and proper use of the hazen-williams equation. Journal of Hydraulic Engineering 126 (2), pp. 167–170. External Links: Document Cited by: §2.3.3.
  • M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Advances in Neural Information Processing Systems 29 (2016). External Links: Cited by: §1, §2.2, §2.2, §5.
  • Y. Deng, W. Jiang, and R. Sadiq (2011) Modeling contaminant intrusion in water distribution networks: a new similarity-based DST method. Expert Systems with Applications 38 (1), pp. 571–578. External Links: Document Cited by: §1.
  • B. Du, Q. Zhou, J. Guo, S. Guo, and L. Wang (2021)

    Deep learning with long short-term memory neural networks combining wavelet transform and principal component analysis for daily urban water demand forecasting

    Expert Systems with Applications 171, pp. 114571. External Links: Document Cited by: §1.
  • V. P. Dwivedi, C. K. Joshi, T. Laurent, Y. Bengio, and X. Bresson (2020) Benchmarking graph neural networks. External Links: 2003.00982v1 Cited by: §1.
  • R. L. Farias, V. Puig, H. R. Rangel, and J. Flores (2018) Multi-model prediction for demand forecast in water distribution networks. Energies 11 (3), pp. 660. External Links: Document Cited by: §1, §1.
  • M. Fey and J. E. Lenssen (2019) Fast graph representation learning with pytorch geometric. External Links: 1903.02428 Cited by: Appendix A.
  • P. C. Fuertes, F. M. Alzamora, M. H. Carot, and J.C. A. Campos (2020) Building and exploiting a digital twin for the management of drinking water distribution networks. Urban Water Journal 17 (8), pp. 704–713. External Links: Document Cited by: §1.
  • X. Glorot and Y. Bengio (2010) Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Y. W. Teh and M. Titterington (Eds.), Proceedings of Machine Learning Research, Vol. 9, Chia Laguna Resort, Sardinia, Italy, pp. 249–256. External Links: Link Cited by: §3.3.
  • G. Hajgató, B. Gyires-Tóth, and G. Paál (2021) GraphConvWat. SmartLab, Budapest University of Technology and Economics. External Links: Link Cited by: Appendix A.
  • G. Hajgató, G. Paál, and B. Gyires-Tóth (2020) Deep reinforcement learning for real-time optimization of pumps in water distribution systems. Journal of Water Resources Planning and Management 146 (11), pp. 04020079. External Links: Document Cited by: §3.1, §5.
  • A. Heinsbroek (2016) EPYNET. Vitens. External Links: Link Cited by: Appendix A.
  • D. Hendrycks and K. Gimpel (2016) Gaussian error linear units (gelus). External Links: 1606.08415 Cited by: §3.3.
  • M. Herrera, L. Torgo, J. Izquierdo, and R. Pérez-García (2010) Predictive models for forecasting hourly urban water demand. Journal of Hydrology 387 (1-2), pp. 141–150. External Links: Document Cited by: §1.
  • S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In

    A Field Guide to Dynamical Recurrent Neural Networks

    , S. C. Kremer and J. F. Kolen (Eds.),
    pp. 237–243. Cited by: §2.3.2.
  • D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), pp. 1–15. External Links: Link Cited by: §3.3.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, pp. 14. External Links: Link Cited by: §2.3.1.
  • K. A. Klise, C. A. Phillips, and R. J. Janke (2013) Two-tiered sensor placement for large water distribution network models. Journal of Infrastructure Systems 19 (4), pp. 465–473. External Links: Document Cited by: §1.
  • S. W. Lee, S. Sarp, D. J. Jeon, and J. H. Kim (2014) Smart water grid: the future water management platform. Desalination and Water Treatment 55 (2), pp. 339–346. External Links: Document Cited by: §1.
  • G. M. Lima, B. M. Brentan, D. Manzi, and E. Luvizotto (2017) Metamodel for nodal pressure estimation at near real-time in water distribution systems using artificial neural networks. Journal of Hydroinformatics 20 (2), pp. 486–496. External Links: Document Cited by: §1.
  • C. P. Liou (1998) Limitations and proper use of the hazen-williams equation. Journal of Hydraulic Engineering 124 (9), pp. 951–954. External Links: Document Cited by: §2.3.3.
  • B.V. Mehta, H. Ghulman, and R. Gerth (1999) A new methodology of using design of experiments as a precursor to neural networks for material processing: extrusion die design. In Proceedings of the Second International Conference on Intelligent Processing and Manufacturing of Materials. IPMM99 (Cat. No.99EX296), pp. 1151–1156. External Links: Document Cited by: §3.1.
  • A. D. Nardo, D. B. Gonzalez, T. Baur, R. Bernini, S. Bodini, S. Capasso, F. Cascetta, F. Castaldo, M. Cocco, P. Cousin, M. DAcunto, R. D. Leo, B. D. Ventura, A. D. Mauro, M. D. Natale, G. D. Virgilio, M. Doveri, B. E. Mansouri, R. Germano, C. Giudicianni, N. Giunta, R. Greco, P. Iovino, E. Katsou, R. Koenig, C. Laspidou, V. Lisbino, L. Lupi, E. M. Díaz, D. Musmarra, M. M. Olivella, O. Paleari, J. Raich, F. Regan, M. J. Rodriguez-Pinzon, J. M. Rodriguez-Varela, L. Sanfilippo, J. S. Seelam, G. F. Santonastaso, D. Savic, A. Scozzari, F. Soldovieri, F. P. Tuccinardi, V. Tzatchkov, L. Vamvakeridou-Lyroudia, M. van Rijn, R. Velotta, S. Venticinque, and H. Wouters (2018) On-line measuring sensors for smart water network monitoring. In 13th International Conference on Hydroinformatics, pp. 572–581. External Links: Document Cited by: §1.
  • F. K. Odan, L. F. R. Reis, and Z. Kapelan (2014) Use of metamodels in real-time operation of water distribution systems. Procedia Engineering 89, pp. 449–456. External Links: Document Cited by: §1.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. 1–12. Cited by: Appendix A.
  • M. Quiñones-Grueiro, A. Prieto-Moreno, C. Verde, and O. Llanes-Santiago (2019) Decision support system for cyber attack diagnosis in smart water networks. IFAC-PapersOnLine 51 (34), pp. 329–334. External Links: Document Cited by: §1.
  • L. A. Rossman (2000) EPANET 2 users manual. EPA-Environmental Protection Agency. Cited by: Appendix A, §2.1, §2.3.3.
  • D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst (2012)

    The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains

    IEEE Signal Processing Magazine 30 (3), pp. 83–98. External Links: Document,, Link Cited by: §1.
  • R. Sitzenfrei, M. Möderl, and W. Rauch (2013) Automatic generation of water distribution systems based on GIS data. Environmental Modelling & Software 47, pp. 138–147. External Links: Document Cited by: §1.
  • Y. Tanaka (2018) Spectral domain sampling of graph signals. IEEE Transactions on Signal Processing 66 (14), pp. 3752–3767. External Links: Document Cited by: §1.
  • R. Taormina, S. Galelli, N. O. Tippenhauer, E. Salomons, A. Ostfeld, D. G. Eliades, M. Aghashahi, R. Sundararajan, M. Pourahmadi, M. K. Banks, B. M. Brentan, E. Campbell, G. Lima, D. Manzi, D. Ayala-Cabrera, M. Herrera, I. Montalvo, J. Izquierdo, E. Luvizotto, S. E. Chandy, A. Rasekh, Z. A. Barker, B. Campbell, M. E. Shafiee, M. Giacomoni, N. Gatsis, A. Taha, A. A. Abokifa, K. Haddad, C. S. Lo, P. Biswas, M. F. K. Pasha, B. Kc, S. L. Somasundaram, M. Housh, and Z. Ohar (2018) Battle of the attack detection algorithms: disclosing cyber attacks on water distribution networks. Journal of Water Resources Planning and Management 144 (8), pp. 04018048. External Links: Document Cited by: §1.
  • R. Taormina and S. Galelli (2018) Deep-learning approach to the detection and localization of cyber-physical attacks on water distribution systems. Journal of Water Resources Planning and Management 144 (10), pp. 04018065. External Links: Document Cited by: §1.
  • K. E. Taylor (2001) Summarizing multiple aspects of model performance in a single diagram. Journal of Geophysical Research: Atmospheres 106 (D7), pp. 7183–7192. External Links: Document Cited by: §4.
  • University of Exeter (1986) Centre for Water Systems. External Links: Link Cited by: §3.1.
  • Z. Wei, A. Pagani, G. Fu, I. Guymer, W. Chen, J. McCann, and W. Guo (2020) Optimal sampling of water distribution network dynamics using graph fourier transform. IEEE Transactions on Network Science and Engineering 7 (3), pp. 1570–1582. External Links: Document Cited by: §1.
  • A. J. Whittle, L. Girod, A. Preis, M. Allen, H. B. Lim, M. Iqbal, S. Srirangarajan, C. Fu, K. J. Wong, and D. Goldsmith (2011) WaterWiSe@SG: a testbed for continuous monitoring of the water distribution system in singapore. In Water Distribution Systems Analysis 2010, pp. 1362–1378. External Links: Document Cited by: §1.
  • F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger (2019) Simplifying graph convolutional networks. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, pp. 6861–6871. External Links: Link Cited by: §2.3.3.
  • Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu (2021) A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32 (1), pp. 4–24. External Links: Document, 1901.00596, Link Cited by: §1, §2.2.
  • X. Xie, Q. Zhou, D. Hou, and H. Zhang (2017) Compressed sensing based optimal sensor placement for leak localization in water distribution networks. Journal of Hydroinformatics 20 (6), pp. 1286–1295. External Links: Document Cited by: §1.
  • A. Yazdani and P. Jeffrey (2011) Complex network analysis of water distribution systems. Chaos: An Interdisciplinary Journal of Nonlinear Science 21 (1), pp. 016111. External Links: Document Cited by: §1.