Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data

by   N. Benjamin Erichson, et al.
berkeley college

In many applications, it is important to reconstruct a fluid flow field, or some other high-dimensional state, from limited measurements and limited data. In this work, we propose a shallow neural network-based learning methodology for such fluid flow reconstruction. Our approach learns an end-to-end mapping between the sensor measurements and the high-dimensional fluid flow field, without any heavy preprocessing on the raw data. No prior knowledge is assumed to be available, and the estimation method is purely data-driven. We demonstrate the performance on three examples in fluid mechanics and oceanography, showing that this modern data-driven approach outperforms traditional modal approximation techniques which are commonly used for flow reconstruction. Not only does the proposed method show superior performance characteristics, it can also produce a comparable level of performance with traditional methods in the area, using significantly fewer sensors. Thus, the mathematical architecture is ideal for emerging global monitoring technologies where measurement data are often limited.



There are no comments yet.


page 9

page 11

page 14

page 16

page 18

page 19

page 20


Machine Learning for Fluid Mechanics

The field of fluid mechanics is rapidly advancing, driven by unprecedent...

Global field reconstruction from sparse sensors with Voronoi tessellation-assisted deep learning

Achieving accurate and robust global situational awareness of a complex ...

Applying Machine Learning to Study Fluid Mechanics

This paper provides a short overview of how to use machine learning to b...

Flow based features and validation metric for machine learning reconstruction of PIV data

Reconstruction of flow field from real sparse data by a physics-oriented...

Physics perception in sloshing scenes with guaranteed thermodynamic consistency

Physics perception very often faces the problem that only limited data o...

Surrogate Modeling of Fluid Dynamics with a Multigrid Inspired Neural Network Architecture

Algebraic or geometric multigrid methods are commonly used in numerical ...

Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates

We demonstrate an algorithm for learning a flexible color-magnitude diag...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The ability to reconstruct coherent flow features from limited observation can be critically enabling for applications across the physical and engineering sciences [12, 63, 14, 50, 74]. For example, efficient and accurate fluid flow estimation is critical for active flow control, and it may help to craft more fuel-efficient automobiles as well as high-efficiency turbines. The ability to reconstruct important fluid flow features from limited observation is also central in applications as diverse as cardiac bloodflow modeling and climate science [11]. All of these applications rely on estimating the structure of fluid flows based on limited sensor measurements.

More concretely, the objective is to estimate the flow field from sensor measurements , that is, to learn the relationship . The restriction of limited sensors gives . The sensor measurements are collected via a sampling process from the high-dimensional field . We can describe this process as


where denotes a measurement operator. Now, the task of flow reconstruction requires the construction of an inverse model that produces the field in response to the observations , which we may describe as


where denotes a non-linear forward operator. However, the measurement operator may be unknown or highly-nonlinear in practice. Hence, the problem is often ill-posed, and we cannot directly invert the measurement operator to obtain the forward operator .

Fortunately, given a set of training examples , we may learn a function to approximate the forward operator . Specifically, we aim to learn a function which maps a limited number of measurements to the estimated state :


so that the misfit is small, e.g., in a Euclidean sense over all sensor measurements


is a small positive number. Neural network based inversion is common practice in machine learning 

[52], dating back to the late 80’s [75]. This powerful learning paradigm is also increasingly used for flow reconstruction, prediction, and simulations [46, 69, 42, 17, 32, 70]. In particular, deep inverse transform learning is an emerging concept [56, 41, 1, 73]

, which has been shown to outperform traditional methods in applications such as denoising, deconvolution, and super-resolution.

Here, we explore shallow neural networks (SNNs) to learn the input-to-output mapping between the sensor measurements and the flow field. Figure 1 shows a design sketch for the proposed framework for fluid flow reconstruction. We can express the network architecture, which we denote as shallow decoder (SD), more concisely as follows:

SNNs are considered to be networks with very few hidden layers. We favor shallow over deep architectures, because the simplicity of SNNs allows faster training, less tuning, and easier interpretation (and also since it works, and thus there is no need to consider deeper architectures).

There are several advantages of this mathematical approach over traditional scientific computing methods for fluid flow reconstruction [18, 24, 13, 71, 50]. First, the SD provides a supervised joint learning framework for the low-dimensional approximation space of the flow field and the map from the measurements to this low-dimensional space. This allows the approximation basis to be tailored not only to the state space but also to the associated measurements, preventing observability issues. In contrast, these two steps are disconnected in standard methods (discussed in more detail in Section 2

). Second, the method allows for flexibility in the measurements, which do not necessarily have to be linearly related to the state, as in many standard methods. Finally, the shallow decoder network produces interpretable features of the dynamics, potentially improving on classical proper orthogonal decomposition (POD), also known as principal component analysis (PCA), low-rank features. For instance, Figure 

2 shows that the basis learned via an SNN exhibits elements resembling physically consistent quantities, in contrast with alternative POD (PCA-based) modal approximation methods that enforce orthogonality.

Limitations of our approach are standard to data-driven methods, in that the training data should be as representative as possible of the system, in the sense that it should comprise samples drawn from the same statistical distribution as the testing data.

Figure 1. Illustration of the shallow decoder which maps a few sensor measurements to the estimated field . In other words, this neural network based learning methodology provides an end-to-end mapping between the sensor measurements and the fluid flow field.

[width=0.95]figures/modes_overview (a) Modes of proper orthogonal decomposition (POD).(b) Modes of the output layer learned using the SD.

Figure 2. Dominant modes learned by the shallow decoder in contrast to the POD modes. These dominant features show that the SD constructs a reasonable characterization of the flow behind a cylinder. Indeed, by not constraining the modes to be linear and orthogonal, as is enforced with POD, a potentially more interpretable feature space can be extracted from data. Such modes can be exploited for reconstruction of the state space from limited measurements and limited data.

The paper is organized as follows. Sec. 2 discusses traditional modal approximations techniques. In Sec. 3, we briefly discuss shallow learning techniques for flow reconstruction. Then, in Sec. 4, the specific implementation and architecture of our shallow decoder is described. Results are presented in Sec. 5 for various applications of interest. We apply the shallow decoder to several prototypical flow field examples, considering both point-wise and sub-gridscale measurements. We aim to reconstruct (a) the vorticity field of a flow behind a cylinder from a handful sensors on the cylinder surface, (b) the mean sea surface temperature from weekly sea surface temperatures for the last 26 years, and (c) the velocity field of a turbulent isotropic flow. We show that a very small number of sensor measurements is indeed sufficient for flow reconstruction in these applications. Further, we show that the shallow decoder can handle non-linear measurements and is robust to measurement noise. The results show significantly improved performance compared to more traditional modal approximations techniques. The paper concludes in Sec. 6 with a discussion and outlook of the use of SNNs for more general flow field reconstructions.

2. Background on high-dimensional state estimation

The task of interpolating from a limited number of measurements to the high-dimensional state-space is made possible by the fact that the dynamics for many complex systems, or datasets, exhibit some sort of low-dimensional structure. This fact has been exploited for state estimation using (i) a tailored basis, such as POD, or (ii) a general basis in which the signal is sparse,

e.g., typically a Fourier or wavelet basis will suffice. In the former, gappy POD methods [28] have been developed for principled interpolation strategies [18, 24, 13, 71, 50]. In the latter, compressive sensing methods [15, 22, 3]

serve as a principled technique for reconstruction. Both techniques exploit the fact that there exists a basis in which the high-dimensional state vector has a sparse, or compressible, representation. In 

[51], a basis is learned such that it leads to a sparse approximation of the high-dimensional state while enforcing observability from the sensors.

Next, we describe standard techniques for the estimation of a state from observations , and we discuss observability issues. Established techniques for state reconstruction are based on the idea that a field can be expressed in terms of a rank- approximation


where are the modes of the approximation and

are the associated coefficients. The approximation space is derived from a given training set using unsupervised learning techniques. A typical approach to determine the approximation modes is POD 

[4, 18, 24, 50]. Randomized methods for linear algebra enable the fast computation of such approximation modes [49, 23, 37, 26, 27, 25]. Given the approximation modes , estimating the state reduces to determining the coefficients from the sensor measurements using supervised techniques. These typically aim to find the minimum-energy or minimum-norm solution that is consistent in a least-squares sense with the measured data.

2.1. Standard approach: Estimation via POD based methods

Two POD-based methods are discussed, which we will refer to as pod and pod plus in the following. Both approaches reconstruct the state with POD modes, by estimating the coefficients from sensor information. The POD modes are obtained as the most dominant left singular vectors of a training set :


where denotes the left singular vectors and

the right singular vectors. The corresponding singular values are the diagonal elements of


2.1.1. Standard POD-based method

Let a linear measurement operator describe the relationship between the field and the associated observations, . The approximation of the field with the approximation modes is obtained by solving the following equation for :


The standard approach is to simply solve the following least-squares problem


The solution with the minimum -norm is given by:


with the superscript denoting the Moore-Penrose pseudo-inverse. In this situation, the high-dimensional state is then estimated as


This approach is hereafter simply referred to as POD.

2.1.2. Improved POD-based method

This above described approach requires explicit knowledge of the observation operator and is subjected to ill-conditioning of the least-squares problem. These limitations render this “vanilla flavored” approach often impractical in many situations, and they motivate an alternative formulation.

The idea is to learn the map between coefficients and observations without explicitly referring to . It can be implicitly described by a, possibly nonlinear, operator typically determined by minimizing the Bayes risk, defined as the misfit in the -sense:



is the joint probability measure of the coefficients and the observations.

We assume the training set is representative of the underlying system, in the sense that it should contain independent samples drawn from the stationary distribution of the physical system at hand. The Bayes risk is then approximated by an empirical estimate, and the operator is determined as


When the measurement operator is linear, is then an empirical estimate of , the contribution of the basis modes to the measurements . Compared to the closed-form solution in Eq. (8), this formulation brings flexibility in the properties of the map . For instance, regularization by sparsity can be enforced in , via - or -penalization. Expressing Eq. (11) in matrix form yields:


where and respectively refer to the training data measurements and coefficients . It immediatly follows


and the approximation obtained by pod plus is finally given by the solution to the following least-squares problem


However, is typically higher-dimensional than , and thus the problem is ill-posed. We then make use of the popular Tikhonov regularization, selecting the solution with the minimum

-norm. This results in a ridge regression problem formulated as:


with the penalization parameter typically estimated through -fold cross-validation. As will be seen in the examples below, penalization of the magnitude of the coefficients can significantly improve the performance of the POD approach.

2.2. Observability issue

The above techniques are standard in the scientific computing literature for flow reconstruction, but they bear a severe limitation. Indeed, since it is derived in an unsupervised fashion from the set of instances , the approximation basis is agnostic to the measurements . In other words, the approximation basis is determined with no supervision by the measurements. To illustrate the impact of this situation, let be the least-squares estimate of the approximation coefficients for a given field . The difference between the least-square estimate coefficients and the coefficients obtained from the linear sensor measurements writes


and the error in the reconstructed field is obtained immediately:



is the identity matrix of suitable dimension.

The error in the reconstructed field is seen to depend on both the approximation basis and the measurement operator . The measurement operator is entirely defined by the sensor location, and it does not depend on the basis considered to approximate the field. It is thus clear that, to reduce (the expectation of) the reconstruction error, the approximation basis must be informed both by the dataset and the sensors available, through . For example, poorly located sensors will lead to a large set of to lie in the nullspace of , preventing their estimation, while the coefficients of certain approximation modes may be affected by the observation of certain realizations being severely amplified by if the approximation basis is not carefully chosen.

This remark can be interpreted in terms of the control theory concept of observability of the basis modes by the sensors. Most papers in the literature focus their attention on deriving an approximation basis leading to a good representation [13, 71, 50], i.e., such that the training set is well approximated in the -dimensional basis , . But how well the associated coefficients are informed by the measurements is usually overlooked when deriving the basis. In practice, the decoupling between learning an approximation basis and learning a map to the associated coefficients often leads to a performance bottleneck in the estimation procedure. Enforcing observability of the approximation basis by the sensors is key to a good recovery performance and can dramatically improve upon unsupervised methods, as shown in [51].

3. Shallow learning for flow reconstruction

Shallow learning techniques are widely used for flow reconstruction. For instance, the approximation based approach for flow reconstruction, outlined in Section 2

, can be considered to have two levels. The first level is concerned with computing an approximation basis, while the second level performs a linear weighted combination of the basis elements to estimate the high-dimensional flow field. Such shallow learning techniques are easy to train and tune. In addition, the levels are often physically meaningful, and they may provide some interesting insights into the underlying mechanics of the system under consideration. However, the recent success of deep learning has put shallow learning somewhat out of focus 

[8, 44, 65]

. Indeed, the expressive power of deep architectures has pushed forward many tasks in computer vision and language processing. The high expressive power is due to a deep architecture design which has a large number of hidden layers. Particularly, in computer vision, the success is centered around convolution layers, which exhibit sparse connectivity of the neurons. These layers are augmented with non-linear activation functions and additional pooling layers. Several theoretical results support some of the advantage of deeper architectures 

[20, 10, 54, 53]. Thus, the reader may wonder why we advocate shallow architectures for flow reconstruction? Among other things, for the applications in which we are interested, deep learning has the following downsides:

  • Computation: Deep architectures require tremendous amounts of computational power. In the era of high-end graphic processing units (GPU), this is somewhat less of an issue, yet training can still be costly if the input data are high-dimensional.

  • Tuning: Regularization in its various forms can be used to ease the issue of overfitting by limiting the complexity of the network. However, in practice, this requires “fiddling” with a large number of knobs (i.e., hyper-parameters) [7]. Generally, deeper architectures have been shown to be more sensitive to the choice of hyper-parameters. This remains a challenge even in light of recent progress in understanding generalization and overfitting in deep networks [59, 5].

  • Data: The more critical issue is that deep architectures are greedy, i.e., a large number of training examples is required in order to learn a function which generalizes to new data points. This is because deep nets are typically over-parametrized and tend to interpolate the data too closely [7, 48, 16]. Hence, the more examples used for training, the better the generalization error [59].

Further, there is a critical difference between standard machine learning benchmark datasets and scientific datasets (such as those we consider here). The former datasets, including MNIST, CIFAR10 and ImageNet, provide a large number of low-resolution training examples. In sharp contrast, scientific applications often generate a small amount of high-dimensional (high-volume) data, yet we face the situation that labeled examples are in short supply. For instance, recordings for climate data date back only so many years. Fortunately, scientific data often feature more structure, which can render an easier learning task. Hence, the hope is that shallow learning techniques perform better in the scientific data setting. Indeed, several results show that shallow learning is better suited for limited data than deeper networks 

[64, 21, 47, 39]. Interestingly, SNNs have also successfully been used for applications arising in area of fluid mechanics, both before the recent hype over deep learning [30, 55] and also more recently [33, 6].

In the following, we show that the performance for flow reconstruction problems can be greatly improved by adding just one additional layer of complexity. This means that (instead of using a very shallow learning approach, as in traditional scientific methods) we explore architectures with one additional stage.

4. A shallow decoder for flow reconstruction

We can define a neural network (NN) with layers as a nested set of functions


where denotes a coordinate-wise scalar (non-linear) activation function and denotes a set of weight matrices, , with matching dimensions. NN-based learning provides a flexible framework for estimating the relationship between quantities from a collection of samples. Here, we consider SNNs, which are considered to be networks with very few, often only one, or even no, hidden layers, i.e., is very small.

In the following, an estimate of a vector is denoted as , while denotes dummy vectors upon which one optimizes. Relying on a training set , with examples and corresponding sensor measurements , we aim to learn a function belonging to a class of neural networks which minimizes the misfit in an Euclidean sense, over all sensor measurements


We assume that only a small number of training examples is available. Further, no prior information is assumed to be available, and the estimation method is purely data-driven. Importantly, we assume no knowledge about the underlying measurement operator which is used to collect the sensor measurements. Further, unlike most classical methods for flow reconstruction, such as those discussed in Sec. 2, this NN-based learning methodology allows the joint learning of both the modes and the coefficients.

4.1. Architecture

We now discuss some general principles guiding the design of a good network architecture for flow reconstruction. These considerations lead to the following nested nonlinear function


The architecture design is guided by the paradigm of simplicity. Indeed, the architecture should enable fast training, little tuning, and offer an intuitive interpretation.

Recall that the interpretability of the flow field estimate is favored by representing it in a basis of moderate size, whose modes can be identified with spatial structures of the field. This means, the estimate can be represented as a linear combination of modes , weighted by coefficients , see Eq. (4). These modes are a function of the inputs. This naturally leads to consider a network in which the output is given by a linear, fully connected, last layer of inputs, interpreted as . These coefficients are informed by the sensor measurements in a nonlinear way.

The nonlinear map can be described by a hidden layer, whose outputs are hereafter termed measurement features, in analogy with kernel-based methods, where raw measurements are nonlinearly lifted as extended measurements to a higher-dimensional space. In this architecture, the measurement features essentially describe nonlinear combinations of the input measurement . The nonlinear combinations are then mapped to the coefficients associated with the modes . While the size of the output layer is that of the discrete field , the size of the last hidden layer () is chosen and defines the size of the dictionary . This size can be estimated from the data by dimensionality estimation techniques [36, 29]. Restricting the description of the training data to a low-dimensional space is of potential interest to practitioners who may interpret the elements of the resulting basis in a physically meaningful way. The additional structure allows one to express the field of interest in terms of modes that practitioners may interpret, i.e., relate to some physics phenomena such as traveling waves, instability patterns (e.g., Kelvin-Helmholtz), etc.

In contrast, the size of the first hidden layer describing is essentially driven by the size of the input layer () and the number of nonlinear combinations used to nonlinearly inform the coefficients . The general shape of the network then bears flexibility in the hidden layers. A popular architecture for decoders consists of non decreasing layer sizes, so as to increase continuously the size of the representation from the low-dimensional observations to the high-dimensional field. We can model as a shallow neural network with two hidden layers and , followed by a linear output layer .

Two types of hidden layers, namely fully-connected (FC) and convolution layers can be considered. The power of convolution layers is key to the success of recent deep learning architectures in computer vision. However, in our problem, we favor fully-connected layers. The reason is two-fold: (i) our sensor measurements have no spatial ordering; (ii) depending on the number of filters, convolution layers require a large number of examples for training, while we assume that only a small number of examples are available for training. Thus, the first and second hidden layers take the form


where denotes a dense weight matrix and is a bias term. The function denotes an activation function used to introduce nonlinearity into the model as discussed below. The final linear output layer simply takes the form of

where we interpret the columns of the weight matrix as modes. In summary, the architecture of our shallow decoder can be outlined as

Depending on the dataset, we need to adjust the size of each layer. Here, we use narrow rather than wide layers. Prescribing the size of the output layer restricts the dimension of the space in which the estimation lies, and it effectively regularizes the problem, e.g., filtering-out most of the noise which is not living in a low-dimensional space. Beyond robustness with respect to noise, reducing the dimension brings several additional benefits, including faster learning and fewer suboptimal local minima.


(a) ReLU


(b) Swish


(c) SoftShrinkage
Figure 3. Illustration of several different activation functions.

The rectified linear unit (ReLU) activation function is among the most popular choices in computer vision applications, owing to its favorable properties 

[35]. The ReLU activation, illustrated in Figure 2(a), is defined as the positive part of a signal :


The transformed input signal is also called activation. While the ReLU activation function performs best on average in our experiments, there are other choices. For instance, we have considered the Swish [2] and SoftShrinkage activation function, also illustrated in Figure 3. These two activation functions can be fine-tuned via an additional hyper-parameter and there are potential situations in which these activation functions outperform ReLU. Interestingly, different activation functions considerably affect the modes (i.e., columns of the weight matrix ), as shown in Figure 4.

(a) ReLU
(b) Swish
(c) SoftShrinkage
Figure 4. Dominant three modes obtained by using different activation functions.

4.2. Regularization

Overfitting is a common problem in machine learning and occurs if a function interpolates a limited set of data points too closely. In particular, this is a problem for deep neural networks which often have more neurons (trainable parameters) than can be justified by the limited amount of training examples which are available. There is increasing interest in characterizing and understanding generalization and overfitting in NNs [59, 5]. Hence, additional constraints are required to learn a function which generalizes to new observations that have not been used for training. Standard strategies to avoid overfitting include early stopping rules, and weight penalties (

regularization) to regularize the complexity of the function (network). In addition to these two strategies, we use also batch normalization (BN) 

[40] and dropout layers (DL) [66] to improve the convergence and robustness of the shallow decoder. This yields the following architecture:

Regularization, in its various forms, requires one to “fiddle” with a large number of knobs (i.e., hyper-parameters). However, we have found that SNNs are less sensitive to the particular choice of parameters; hence, SNNs are easier to tune.

Batch normalization.

BN is a technique to normalize (mean zero and unit standard deviation) the activation. From a statistical perspective, BN eases the effect of internal covariate shifts 

[40]. In other words, BN accounts for the change of distribution of the output signals (activation) across different mini batches during training. Each BN layer has two parameters which are learned during the training stage. This simple, yet effective, prepossessing step allows one to use higher learning rates for training the network. In addition it also reduces overfitting owing to its regularization effect.

Dropout layer.

DL helps to improve the robustness of a NN. The idea is to switch off (drop) a small fraction of randomly chosen hidden units (neurons) during the training stage. This strategy can be seen as some form of regularization which also helps to reduce interdependent learning between the units of a fully connected layer. In our experiments the drop ratio is set to .

4.3. Optimization

Given a training set with targets and corresponding sensor measurements , we minimize the misfit between the reconstructed quantity and the observed quantity , in terms of the squared -norm

The second term on the right hand side introduces regularization to the weight matrices, which is controlled via the parameter . It is well-known that

-norm is sensitive to outliers; and the

-norm can be used as a more robust loss function. Alternatively, a popular option is the Huber norm (smooth

-loss), leading to the following optimization problem


The tuning parameter controls the threshold. The Huber loss functions grow at a linear rate for residuals outside the thresholding parameter , rather than quadratically. This can reduce the influence of large deviations when learning the decoder. Further, it has been reported that this loss function prevents exploding gradients in some cases [34]. Thus, the Huber loss may be an interesting alternative.

We use the ADAM optimization algorithm [43] to train the shallow decoder, with learning rate and weight decay (also known as

regularization). The learning rate, also known as step size, controls how much we adjust the weights in each epoch. The weight decay parameter is important since it allows one to regularize the complexity of the network. In practice, we can improve the performance by changing the learning rate during training. We decay the learning rate by a factor of

after epochs. Indeed, the reconstruction performance in our experiments is considerably improved by this dynamic scheme, compared to a fixed parameter setting. In addition, we decrease the weight decay by a factor of . Further, we use a relatively large batch size, since we have only a limited amount of data available for training. Overall, in our experiments, ADAM shows a better performance than stochastic gradient decent (SGD) with momentum [67] and averaged SGD [60]. The hyper-parameters can be fine tuned in practice, but our choice of parameters works reasonably well for several different examples. Note that we use the method described by [38] in order to initialize the weights. This initialization scheme is favorable, in particular because the output layer is high-dimensional.

5. Empirical evaluation

(a) Interpolation
(b) Extrapolation
Figure 5. Two different training and test set configurations, showing (a) an interpolation task and (b) an extrapolation task. Here, the gray columns indicate snapshots used for training, while the red columns indicate snapshots used for testing.

We evaluate our methods on three classes of data. First, we consider a periodic flow behind a circular cylinder, as a canonical example of fluid flow. Then, we consider the weekly mean sea surface temperature (SST), as a second and more challenging example. Finally, the third and most challenging example we consider is a forced isotropic turbulence flow.

As discussed in Section 1, the shallow decoder requires that the training data represent the system, in the sense that they should comprise samples drawn from the same statistical distribution as the testing data. Indeed, this limitation is standard to data-driven methods, both for flow reconstruction and also more generally. Hence, we are mainly concerned with exploring reconstruction performance and generalizability for interpolation tasks rather than for extrapolation tasks. In our third example, however, we demonstrate the limitations of the the shallow decoder, illustrating difficulties that arise when one tries to extrapolate, rather than interpolate, the flow field. Figure 5 illustrates the difference between the two types of tasks.

In the first two example classes of data, the sensor information is a subset of the high-dimensional flow field, i.e., the measurement operator only has one non-zero entry in rows corresponding to the index of a sensor location. Letting be the set of indices indexing the spatial location of the sensors, the measurement operator is such that


that is, the observations are simply point-wise measurements of the field of interest. In the above equation, is the restriction of to its rows indexed by . In this paper, no attempt is made to optimize the location of the sensors. In practical situations, they are often given or constrained by other considerations (wiring, intrusivity, manufacturing, etc.). We use simply uniform random locations in our examples. The third example class of data demonstrates the SD using sub-gridscale measurements.

The quality of the reconstruction accuracy is quantified in terms of the normalized root-mean-square residual error


denoted in the following as “NME.” However, this measure can be misleading if the empirical mean is dominating. Hence, we consider also a more sensitive measure which quantifies the reconstruction accuracy of the deviations around the empirical mean. We define this measure as


where and are the fluctuating parts around the empirical mean. In our experiments, we average the errors over runs for different sensor distributions.

5.1. Setup for our empirical evaluation

Here, we provide details about the concrete network architectures of the shallow decoder, which are used for the different examples. The networks are implemented in Python using PyTorch; and research code for flow behind the cylinder is available via Tables 1– 3 show the details. For each example we use a similar architecture design. The difference is that we use a slightly wider design (more neurons per layer) for the SST dataset and the isotropic flow. That is because we are using a larger number of sensors for these two problems, and thus we need to increase the capacity of the network. In each situation, the learning rate is set to with a decay rate of . The weight decay is set to with a decay rate of . We stop training after epochs.

Layer Weight size Input Shape Output Shape Activation Batch Norm. Dropout
FC sensors 35 sensors 35 ReLU True -
FC 35 40 25 40 ReLU True -
FC 40 76,416 40 76,416 Linear - -
Table 1. Architecture of the SD for the flow behind the cylinder. The batch size is set to . Here, we set the dropout rate to for the noisy situation.
Layer Weight size Input Shape Output Shape Activation Batch Norm. Dropout
FC sensors 350 sensors 350 ReLU True
FC 350 400 350 400 ReLU True -
FC 400 44,219 400 44,219 Linear - -
Table 2. Architecture of the SD for the SST dataset. Here, the batch size is set to .
Layer Weight size Input Shape Output Shape Activation Batch Norm. Dropout
FC sensors 350 sensors 350 ReLU True
FC 350 400 350 400 ReLU True -
FC 400 122,500 400 122,500 Linear - -
Table 3. Architecture of the SD for isotropic flow. Here, the batch size is set to .

5.2. Fluid flow behind cylinder

The first example we consider is the fluid flow behind a circular cylinder, at Reynolds number , based on cylinder diameter, a canonical example in fluid dynamics [57]. The flow is characterized by a periodically shedding wake structure and exhibits smooth, large scale, patterns. A direct numerical simulation of the two-dimensional Navier-Stokes equations is achieved via the immersed boundary projection method [68, 19]. In particular, we use the fast multidomain method [19], which simulates the flow on five nested grids of increasing size, with each grid consisting of grid points, covering a domain of cylinder diameters on the finest domain. We collect snapshots in time, sampled uniformly in time and covering several periods of vortex shedding. For the following experiment, we use cropped snapshots of dimension on the finest domain, as we omit the spatial domain upstream to the cylinder. Further, we split the dataset into a training and test set so that the training set comprises the first snapshots, while the remaining snapshots are used for validation. Note that different splittings (interpolation and extrapolation) yield nearly the same results since the flow is periodic.

5.2.1. Varying numbers of random structured point-wise sensor measurements

We investigate the performance of the shallow decoder using varying numbers of sensors. A realistic setting is considered in that the sensors can only be located on a solid surface. The retained configuration aims at reconstructing the entire vorticity flow field from information at the cylinder surface only. The results are averaged over different sensor distributions on the cylinder downstream-facing surface and are summarized in Table 4. Further, to contextualize the precision of the algorithms, we also state the standard deviation in parentheses.





Figure 6. Visual results for the canonical flow for two different sensor distributions. In (a) the target snapshots and the specific sensor configurations (here using sensors) are shown. Depending on the sensor distribution, the POD-based method is not able to accurately reconstruct the high-dimensional flow field, as shown in (b). The regularized pod plus method performs slightly better, as shown in (c). The shallow decoder yields an accurate flow reconstruction, as shown in (d).
Sensors Training Set Test Set
pod 2 0.310 (0.01) 0.449 (0.01) 0.316 (0.01) 0.452 (0.01)
pod plus 2 0.309 (0.01) 0.449 (0.01) 0.315 (0.30) 0.451 (0.01)
shallow decoder 2 0.004 (0.00) 0.006 (0.00) 0.007 (0.00) 0.011 (0.00)
pod 5 0.465 (0.39) 0.675 (0.57) 0.488 (0.41) 0.698 (0.59)
pod plus 5 0.204 (0.04) 0.297 (0.05) 0.212 (0.04) 0.303 (0.06)
shallow decoder 5 0.003 (0.00) 0.004 (0.00) 0.006 (0.00) 0.008 (0.00)
pod 10 0.346 (1.54) 0.502 (2.23) 0.379 (1.70) 0.542 (2.43)
pod plus 10 0.041 (0.02) 0.059 (0.02) 0.040 (0.01) 0.057 (0.02)
shallow decoder 10 0.002 (0.00) 0.003 (0.00) 0.005 (0.00) 0.007 (0.00)
pod 15 0.441 (1.81) 0.639 (2.63) 0.574 (2.44) 0.821 (3.49)
pod plus 15 0.021 (0.01) 0.031 (0.01) 0.021 (0.01) 0.029 (0.01)
shallow decoder 15 0.002 (0.00) 0.003 (0.00) 0.005 (0.00) 0.007 (0.00)
Table 4. Performance for the flow past cylinder for a varying number of sensors. Results are averaged over runs with different sensor distributions, with standard deviations in parentheses.
(a) Training data
(b) Validation data
Figure 7. Singular value spectrum of the original data (i.e., the flattened fluid flow snapshots are concatenated to form a matrix) and the reconstructed snapshots for the training and test set. Here, the specific sensor configuration shown in the left column of Figure 6 is used. The data reconstructed via the traditional POD-based method show a very poor approximation, while the data reconstructed via the shallow decoder capture the dominant singular values.

The shallow decoder shows an excellent flow reconstruction performance compared to traditional methods. Indeed, the results show that very few sensors are already sufficient to get an accurate approximation. Further, we can see that the shallow decoder is insensitive to the sensor location, i.e., the variability of the performance is low when different sensor distributions on the cylinder surface are used. In stark contrast, this simple setup poses a challenge for the pod method, which is seen to be highly sensitive to the sensor configuration. This is expected since poorly located sensors lead to a large probability that the vorticity field lies in the nullspace of , preventing its estimation, as discussed in Section 2. While regularization can improve the robustness slightly, the pod plus approach still requires about at least sensors to provide accurate estimations for the high-dimensional state-space of the flow where the shallow decoder exhibits a good performance with as few as 5 sensors. Note that the traditional methods could benefit from optimal sensor placement [50]; however, this is beyond the scope of this paper.

Figure 6 provides visual results for two specific sensor configuration using sensors. The second configuration is challenging for pod, which fails to provide an accurate reconstruction. pod plus provides a more accurate reconstruction of the flow field. The shallow decoder outperforms the traditional methods in both situations. Further insights can be gained by examining the singular value spectrum of the original and reconstructed data constituted of a matrix collecting snapshots at different time instants; see Figure 7. The spectrum of the reconstructed flow data using the shallow decoder is seen to closely approximate the true spectrum, while the spectrum of the data reconstructed using pod show provides a very poor approximation.

5.2.2. Non-linear sensor measurements

So far, the sensor information consisted of pointwise measurements of the local flow field so that the -th measurement is given by , , with a Dirac distribution centered at the location of the -th sensor and and the -th component of and respectively. We now consider nonlinear measurements to demonstrate the flexibility of the shallow decoder. Here, we consider the simple setting of squared sensor measurements: , where denotes the Hadamard product. Table 5 provides a summary of the results, using sensors. The shallow decoder is agnostic to the functional form of the sensor measurements, and it achieves nearly the same performance as in the linear case above. The average reconstruction accuracy for the test set increases only by about . The POD-based methods fail for this task since they are linear techniques.

Sensors Training Set Test Set
pod 10 - - - -
pod plus 10 0.781 (0.06) 1.134 (0.09) 0.609 (0.02) 0.871 (0.03)
shallow decoder 10 0.002 (0.00) 0.003 (0.00) 0.006 (0.00) 0.009 (0.01)
Table 5. Performance for estimating the flow behind a cylinder using nonlinear sensor measurements. The standard POD-based method fails for this task. pod plus is able to reconstruct the flow filed, yet the estimation quality is poor. In contrast, the SD method performs well.

5.2.3. Noisy sensor measurements

(a) Truth
(b) pod
(c) pod plus
(d) Shallow Decoder
Figure 8.

Visual results for the flow past the cylinder in presence of white noise. Here the signal-to-noise ratio is

. In (a) the target snapshot and the corresponding sensor configuration (using sensors) is shown. Both, pod and pod plus are not able to reconstruct the flow field, as shown in (b) and (d). The SD is able to reconstruct the coherent structure of the flow field, as shown in (d).
SNR Training Set Test Set
pod 10 9.171 (14.7) 12.69 (20.4) 8.746 (12.9) 11.93 (17.6)
pod plus 10 0.511 (0.03) 0.742 (0.05) 0.551 (0.04) 0.679 (0.06)
shallow decoder 10 0.138 (0.02) 0.201 (0.02) 0.278 (0.04) 0.397 (0.05)
pod 50 4.837 (3.08) 6.946 (4.42) 4.520 (2.75) 6.390 (3.89)
pod plus 50 0.370 (0.04) 0.531 (0.05) 0.364 (0.02) 0.514 (0.02)
shallow decoder 50 0.134 (0.02) 0.198 (0.02) 0.173 (0.02) 0.247 (0.03)
Table 6. Performance for estimating the flow behind a cylinder in presence of white noise, using sensors. pod fails for this task, while pod plus shows a better performance. The SD shows to be robust to noisy sensor measurements and outperforms the traditional techniques.

To investigate further the robustness and flexibility of the shallow decoder, we consider flow reconstruction in the presence of additive white noise. While this is not of concern when dealing with flow simulations, it is a realistic setting when dealing with flows obtained in experimental studies. Table 6 lists the results for both a high and low noise situation with linear measurements. By inspection, the performance of the shallow decoder outperforms classical techniques. In the high noise case, with a signal-to-noise ratio (SNR) of , the average relative reconstruction error for the test set is about for the shallow decoder. For a SNR of , the relative error is as low as . Note that we here use an additional dropout layer (placed after the first fully-connected layer) to improve the robustness of the shallow decoder. In contrast, standard pod fails in both situations. Again, the pod plus method shows improved results over the standard pod. However, the visual results in Figure 8 show that the reconstruction quality of the shallow decoder is favorable. The shallow decoder shows a clear advantage and a denoising effect. Indeed the reconstructed snapshots allow for a meaningful interpretation of the underlying structure. The shallow decoder can thus be seen as a valuable tool for the reconstruction of fluid flows in the presence of noise.

5.2.4. Summary of empirical results for the flow behind cylinder

Figure 9 summarizes the performance of the shallow decoder for varying measurement configurations (number of sensors, linear or nonlinear, noise). The advantage of the shallow decoder compared to the traditional POD based techniques is pronounced. It can be seen that the performance of the traditional techniques is patchy, i.e., the reconstruction quality is highly sensitive to the sensor location. While regularization can mitigate a poor sensor placement design, a relatively larger number () of sensors is required in order to achieve an accurate reconstruction performance. More challenging situations such as nonlinear measurements and sensor noise pose a challenge for the traditional techniques, while the shallow decoder shows to be able to reconstruct dominant flow features in such situations. The computational demands required to train the shallow decoder are minimal, compared to training deep architectures, i.e., the time for training remains below two minutes for this example, using a modern GPU.

(a) sensors.
(b) sensors.
(c) sensors.
(d) Nonlinear measurements.
(e) SNR .
(f) SNR .
Figure 9. Performance overview for cylinder behind the noisy fluid flow.

5.3. Sea surface temperature using random point-wise measurements

(a) Truth
(b) Shallow Decoder
Figure 10. Visual results for the SST dataset. In (a), the high-dimensional target snapshot and the corresponding sensor configurations (using sensors) are shown; and in (b), the results of the Shallow Decoder are shown. Note that we show here the mean centered snapshot. The shallow decoder shows an excellent reconstruction quality for the deviations around the mean with an error as low as .

The second example we consider is the more challenging sea surface temperature (SST) dataset. Complex ocean dynamics lead to rich flow phenomena, featuring interesting seasonal fluctuations. While the mean SST flow field is characterized by a periodic structure, the flow is non-stationary. The dataset consists of the weekly sea surface temperatures for the last 26 years, publicly available from the National Oceanic & Atmospheric Administration (NOAA).111The dataset can be obtained at The data comprise snapshots in time with spatial resolution of . For the following experiments, we only consider measurements, by excluding measurements corresponding to the land masses. Further, we create a training set by selecting snapshots at random, while the remaining snapshots are used for validation.

We consider the performance of the shallow decoder using varying numbers of random sensors scattered across the spatial domain. The results are summarized in Table 7. We observe a large discrepancy between the NME and NFE error. This is because the long-term annual mean field accounts for the majority of the spatial structure of the field. Hence, the NME error is uninformative with respect to the performance of reconstruction methods. In terms of the NFE error the POD based reconstruction techniques is shown to fail to reconstruct the high-dimensional flow field using limited sensor measurements. In contrast, the shallow decoder demonstrates an excellent reconstruction performance both using and measurements. Figure 10 shows visual results to support these quantitative findings.

Sensors Training Set Test Set
pod 32 0.637 (0.59) 5.915 (5.56) 0.649 (0.62) 6.04 (5.77)
pod plus 32 0.293 (0.11) 2.728 (1.05) 0.299 (0.12) 2.783 (1.14)
shallow decoder 32 0.009 (0.00) 0.088 (0.00) 0.014 (0.00) 0.128 (0.00)
pod 64 0.986 (1.34) 9.183 (12.5) 1.007 (1.36) 9.344 (12.7)
pod plus 64 0.229 (0.07) 2.132 (0.66) 0.257 (0.87) 2.389 (0.81)
shallow decoder 64 0.009 (0.00) 0.085 (0.00) 0.012 (0.00) 0.118 (0.00)
Table 7. Performance for estimating the SST dataset for varying numbers of sensors. The SD outperforms the traditional techniques and shows to be highly invariant to the sensor location.

5.4. Turbulent flow using sub-gridscale measurements

The final example we consider is the velocity field of a turbulent isotropic flow.

If the sensor measurements are acquired on a coarse but regular grid, then the reconstruction task may be considered as a super-resolution problem [72, 31, 14]. There are a number of direct applications of super-resolution in fluid mechanics centered around sub-gridscale modeling. Because many fluid flows are inherently multiscale, it may be prohibitively expensive to collect data that captures all spatial scales, especially for iterative optimization and real-time control [12]. Inferring small-scale flow structures below the spatial resolution available is an important task in large eddy simulation (LES), climate modeling, and particle image velocimetry (PIV), to name a few applications. Deep learning has recently been employed for super-resolution in fluid mechanics applications with promising results [32].

Here, we consider data from a forced isotropic turbulence flow generated with a direct numerical simulation using points in a triply periodic domain. For the following experiments, we are using snapshots for training and snapshots for validation. The data spread accross about one large-eddy turnover time. The full dataset is provided as part of the Johns Hopkins Turbulence Database [45, 58].

(a) Snapshot
(b) Low resolution
(c) Shallow Decoder
Figure 11. Visual results for the turbulent isotropic flow using 121 subgrid-cell measurements. The interpolation error of the shallow decoder error is about .
Grids Training Set Test Set
shallow decoder 36 0.029 (0.00) 0.041 (0.00) 0.071 (0.00) 0.101 (0.01)
shallow decoder 64 0.027 (0.00) 0.039 (0.00) 0.067 (0.00) 0.096 (0.00)
shallow decoder 121 0.026 (0.00) 0.038 (0.00) 0.066 (0.00) 0.093 (0.00)
Table 8. Flow reconstruction performance for estimating the isotropic flow using varying numbers of sub-gridscale measurements.

5.4.1. Interpolation

Unlike the examples considered in Section 5, the isotropic turbulent flow is non-periodic in time and highly non-stationary. Thus, this dataset poses a challenging task, even for interpolation. Figure 11 shows visual example for this problem. Note that in our setting the mean grid values are used as inputs, while in the classical super-resolution problem the low-resolution image, shown in Figure 10(b), is used as an input. The quality of the estimated high-dimensional flow field is excellent, despite the challenging problem. Table 8 quantifies the performance for varying numbers of sub-gridscale measurements.

5.4.2. Extrapolation

(a) Test snapshot
(b) Reconstruction
(a) Test snapshot
(c) Test snapshot
(d) Reconstruction
(c) Test snapshot
(e) Test snapshot
(f) Reconstruction
(e) Test snapshot
Figure 12. Visual results illustrating the limitation of the shallow decoder for extrapolation tasks. Flow fields sampled from or close to the statistical distribution describing the training examples can be reconstructed with high accuracy, as shown in (a) and (b). Extrapolation fails for fields which belong to a different statistical distribution, as shown in (e) and (f).

Next, we illustrate the limitation of the shallow decoder. Indeed, it is important to stress that the shallow decoder cannot be used for extrapolating highly non-stationary fluid flows. To illustrate this issue, Figure 12 shows three flow fields at different temporal locations. First, Figure 11(b) shows a test example, which is close in time to the training set. In this case, the shallow decoder is able to extrapolate the flow field with high accuracy. The reconstruction quality drops for snapshots which are further away in time, as shown in Figure 11(d). Finally, Figure 11(f) shows that extrapolation fails if the test example is far away from the training set in time, i.e., the flow field is not drawn from the same statistical distribution as the training examples are.

6. Discussion

The emergence of sensor networks for global monitoring (e.g., ocean and atmospheric monitoring) requires new mathematical techniques that are capable of maximally exploiting sensors for state estimation and forecasting. Emerging algorithms from the machine learning community can be integrated with many traditional scientific computing approaches to enhance sensor network capabilities. For many global monitoring applications, the placement of sensors can be prohibitively expensive, thus requiring mathematical techniques such as the one proposed here, which can exploit a hyper reduction in the number of sensors while maintaining required performance characteristics.

This work demonstrates the enhanced robustness and accuracy of fluid flow field reconstruction by using a shallow learning based methodology. We have explored this approach on a range of example flow fields of increasing complexity. The mathematical formulation presented is significantly different from what is commonly used in flow reconstruction problems, e.g., gappy interpolation with dominant POD modes.

We proposed a shallow decoder with two hidden layer for the problem of flow reconstruction in order to achieve an improved reconstruction performance. During our study, we also compared the shallow decoder to deep architectures, which yield no significant improvement (results are omitted). More concretely, we considered deep convolution networks (DCN) with three to five hidden deconvolutional layers, as well as residual network (resnet) architectures. The reconstruction performance was marginally better for the simple flow behind the cylinder. However, we observed that the shallow decoder

shows a favorable performance in all other situations. A further advantage we observed in our experiments is that shallow architectures are more robust to sensor noise. Moreover, the features extracted for reconstruction are highly interpretable, potentially allowing for enhanced scientific understanding of the system measured.

Table 9 shows a qualitative comparison of our initial experiments for flow reconstruction. Our results show that shallow architectures are more favorable for limited sensor and limited data settings. In conclusion, we advocate a regression towards more shallow networks for flow reconstruction and more generally for scientific applications with limited data.

Future work aims to leverage the underlying laws of physics in flow problems to further improve the efficiency. In the context of flow reconstruction or, more generally, observation of a high-dimensional physical system, insights from the physics at play can be exploited [62, 61]. In particular, the dynamics of many systems do indeed remain low-dimensional and the trajectory of their state vector lies close to manifold whose dimension is significantly lower than the ambient dimension. Moreover, the features exploited from the shallow decoder network can also be integrated in reduced order models (ROMs) for forecasting predictions [9]. In many high-dimensional systems where ROMs are used, the ability to generate low-fidelity models that can be rapidly simulated has revolutionized our ability to model such complex systems, especially in application of complex flow fields. The ability to rapidly generate alternative low-rank feature spaces to POD generates new possibilities for ROMs using limited sampling and limited data. This aspect of the shallow decoder will be explored further in future work.

very shallow (our) shallow deeper
Computational demands: low medium high
Time for hyper-parameter tuning: low medium high
Complexity of architecture design: low medium high
Ability to learn with limited data: high high low
Inference time: low low high
Table 9. Qualitative comparison of network architectures for flow reconstruction. Shallow architectures (such as those we introduce and analyze) show a favorable performance in our experiments, compared to traditional (very shallow) methods and much deeper architectures.


LM gratefully acknowledges the support of the French Agence Nationale pour la Recherche (ANR) and Direction Générale de l’Armement (DGA) via the FlowCon project (ANR-17-ASTR-0022). SLB acknowledges support from the Army Research Office (ARO W911NF-17-1-0422). JNK acknowledges support form the Air Force Office of Scientific Research (FA9550-19-1-0011). LM and JNK also acknowledge support from the Air Force Office of Scientific Research (FA9550-17-1-0329). MWM would like to acknowledge ARO, DARPA, NSF, and ONR for providing partial support for this work. We would also like to thank Kevin Carlberg for valuable discussions about flow reconstruction techniques.