1. Introduction
The ability to reconstruct coherent flow features from limited observation can be critically enabling for applications across the physical and engineering sciences [12, 63, 14, 50, 74]. For example, efficient and accurate fluid flow estimation is critical for active flow control, and it may help to craft more fuelefficient automobiles as well as highefficiency turbines. The ability to reconstruct important fluid flow features from limited observation is also central in applications as diverse as cardiac bloodflow modeling and climate science [11]. All of these applications rely on estimating the structure of fluid flows based on limited sensor measurements.
More concretely, the objective is to estimate the flow field from sensor measurements , that is, to learn the relationship . The restriction of limited sensors gives . The sensor measurements are collected via a sampling process from the highdimensional field . We can describe this process as
(1) 
where denotes a measurement operator. Now, the task of flow reconstruction requires the construction of an inverse model that produces the field in response to the observations , which we may describe as
(2) 
where denotes a nonlinear forward operator. However, the measurement operator may be unknown or highlynonlinear in practice. Hence, the problem is often illposed, and we cannot directly invert the measurement operator to obtain the forward operator .
Fortunately, given a set of training examples , we may learn a function to approximate the forward operator . Specifically, we aim to learn a function which maps a limited number of measurements to the estimated state :
(3) 
so that the misfit is small, e.g., in a Euclidean sense over all sensor measurements
where
is a small positive number. Neural network based inversion is common practice in machine learning
[52], dating back to the late 80’s [75]. This powerful learning paradigm is also increasingly used for flow reconstruction, prediction, and simulations [46, 69, 42, 17, 32, 70]. In particular, deep inverse transform learning is an emerging concept [56, 41, 1, 73], which has been shown to outperform traditional methods in applications such as denoising, deconvolution, and superresolution.
Here, we explore shallow neural networks (SNNs) to learn the inputtooutput mapping between the sensor measurements and the flow field. Figure 1 shows a design sketch for the proposed framework for fluid flow reconstruction. We can express the network architecture, which we denote as shallow decoder (SD), more concisely as follows:
SNNs are considered to be networks with very few hidden layers. We favor shallow over deep architectures, because the simplicity of SNNs allows faster training, less tuning, and easier interpretation (and also since it works, and thus there is no need to consider deeper architectures).
There are several advantages of this mathematical approach over traditional scientific computing methods for fluid flow reconstruction [18, 24, 13, 71, 50]. First, the SD provides a supervised joint learning framework for the lowdimensional approximation space of the flow field and the map from the measurements to this lowdimensional space. This allows the approximation basis to be tailored not only to the state space but also to the associated measurements, preventing observability issues. In contrast, these two steps are disconnected in standard methods (discussed in more detail in Section 2
). Second, the method allows for flexibility in the measurements, which do not necessarily have to be linearly related to the state, as in many standard methods. Finally, the shallow decoder network produces interpretable features of the dynamics, potentially improving on classical proper orthogonal decomposition (POD), also known as principal component analysis (PCA), lowrank features. For instance, Figure
2 shows that the basis learned via an SNN exhibits elements resembling physically consistent quantities, in contrast with alternative POD (PCAbased) modal approximation methods that enforce orthogonality.Limitations of our approach are standard to datadriven methods, in that the training data should be as representative as possible of the system, in the sense that it should comprise samples drawn from the same statistical distribution as the testing data.
The paper is organized as follows. Sec. 2 discusses traditional modal approximations techniques. In Sec. 3, we briefly discuss shallow learning techniques for flow reconstruction. Then, in Sec. 4, the specific implementation and architecture of our shallow decoder is described. Results are presented in Sec. 5 for various applications of interest. We apply the shallow decoder to several prototypical flow field examples, considering both pointwise and subgridscale measurements. We aim to reconstruct (a) the vorticity field of a flow behind a cylinder from a handful sensors on the cylinder surface, (b) the mean sea surface temperature from weekly sea surface temperatures for the last 26 years, and (c) the velocity field of a turbulent isotropic flow. We show that a very small number of sensor measurements is indeed sufficient for flow reconstruction in these applications. Further, we show that the shallow decoder can handle nonlinear measurements and is robust to measurement noise. The results show significantly improved performance compared to more traditional modal approximations techniques. The paper concludes in Sec. 6 with a discussion and outlook of the use of SNNs for more general flow field reconstructions.
2. Background on highdimensional state estimation
The task of interpolating from a limited number of measurements to the highdimensional statespace is made possible by the fact that the dynamics for many complex systems, or datasets, exhibit some sort of lowdimensional structure. This fact has been exploited for state estimation using (i) a tailored basis, such as POD, or (ii) a general basis in which the signal is sparse,
e.g., typically a Fourier or wavelet basis will suffice. In the former, gappy POD methods [28] have been developed for principled interpolation strategies [18, 24, 13, 71, 50]. In the latter, compressive sensing methods [15, 22, 3]serve as a principled technique for reconstruction. Both techniques exploit the fact that there exists a basis in which the highdimensional state vector has a sparse, or compressible, representation. In
[51], a basis is learned such that it leads to a sparse approximation of the highdimensional state while enforcing observability from the sensors.Next, we describe standard techniques for the estimation of a state from observations , and we discuss observability issues. Established techniques for state reconstruction are based on the idea that a field can be expressed in terms of a rank approximation
(4) 
where are the modes of the approximation and
are the associated coefficients. The approximation space is derived from a given training set using unsupervised learning techniques. A typical approach to determine the approximation modes is POD
[4, 18, 24, 50]. Randomized methods for linear algebra enable the fast computation of such approximation modes [49, 23, 37, 26, 27, 25]. Given the approximation modes , estimating the state reduces to determining the coefficients from the sensor measurements using supervised techniques. These typically aim to find the minimumenergy or minimumnorm solution that is consistent in a leastsquares sense with the measured data.2.1. Standard approach: Estimation via POD based methods
Two PODbased methods are discussed, which we will refer to as pod and pod plus in the following. Both approaches reconstruct the state with POD modes, by estimating the coefficients from sensor information. The POD modes are obtained as the most dominant left singular vectors of a training set :
(5) 
where denotes the left singular vectors and
the right singular vectors. The corresponding singular values are the diagonal elements of
.2.1.1. Standard PODbased method
Let a linear measurement operator describe the relationship between the field and the associated observations, . The approximation of the field with the approximation modes is obtained by solving the following equation for :
(6) 
The standard approach is to simply solve the following leastsquares problem
(7) 
The solution with the minimum norm is given by:
(8) 
with the superscript denoting the MoorePenrose pseudoinverse. In this situation, the highdimensional state is then estimated as
(9) 
This approach is hereafter simply referred to as POD.
2.1.2. Improved PODbased method
This above described approach requires explicit knowledge of the observation operator and is subjected to illconditioning of the leastsquares problem. These limitations render this “vanilla flavored” approach often impractical in many situations, and they motivate an alternative formulation.
The idea is to learn the map between coefficients and observations without explicitly referring to . It can be implicitly described by a, possibly nonlinear, operator typically determined by minimizing the Bayes risk, defined as the misfit in the sense:
(10) 
where
is the joint probability measure of the coefficients and the observations.
We assume the training set is representative of the underlying system, in the sense that it should contain independent samples drawn from the stationary distribution of the physical system at hand. The Bayes risk is then approximated by an empirical estimate, and the operator is determined as
(11) 
When the measurement operator is linear, is then an empirical estimate of , the contribution of the basis modes to the measurements . Compared to the closedform solution in Eq. (8), this formulation brings flexibility in the properties of the map . For instance, regularization by sparsity can be enforced in , via  or penalization. Expressing Eq. (11) in matrix form yields:
(12) 
where and respectively refer to the training data measurements and coefficients . It immediatly follows
(13) 
and the approximation obtained by pod plus is finally given by the solution to the following leastsquares problem
(14) 
However, is typically higherdimensional than , and thus the problem is illposed. We then make use of the popular Tikhonov regularization, selecting the solution with the minimum
norm. This results in a ridge regression problem formulated as:
(15) 
with the penalization parameter typically estimated through fold crossvalidation. As will be seen in the examples below, penalization of the magnitude of the coefficients can significantly improve the performance of the POD approach.
2.2. Observability issue
The above techniques are standard in the scientific computing literature for flow reconstruction, but they bear a severe limitation. Indeed, since it is derived in an unsupervised fashion from the set of instances , the approximation basis is agnostic to the measurements . In other words, the approximation basis is determined with no supervision by the measurements. To illustrate the impact of this situation, let be the leastsquares estimate of the approximation coefficients for a given field . The difference between the leastsquare estimate coefficients and the coefficients obtained from the linear sensor measurements writes
(16) 
and the error in the reconstructed field is obtained immediately:
(17) 
where
is the identity matrix of suitable dimension.
The error in the reconstructed field is seen to depend on both the approximation basis and the measurement operator . The measurement operator is entirely defined by the sensor location, and it does not depend on the basis considered to approximate the field. It is thus clear that, to reduce (the expectation of) the reconstruction error, the approximation basis must be informed both by the dataset and the sensors available, through . For example, poorly located sensors will lead to a large set of to lie in the nullspace of , preventing their estimation, while the coefficients of certain approximation modes may be affected by the observation of certain realizations being severely amplified by if the approximation basis is not carefully chosen.
This remark can be interpreted in terms of the control theory concept of observability of the basis modes by the sensors. Most papers in the literature focus their attention on deriving an approximation basis leading to a good representation [13, 71, 50], i.e., such that the training set is well approximated in the dimensional basis , . But how well the associated coefficients are informed by the measurements is usually overlooked when deriving the basis. In practice, the decoupling between learning an approximation basis and learning a map to the associated coefficients often leads to a performance bottleneck in the estimation procedure. Enforcing observability of the approximation basis by the sensors is key to a good recovery performance and can dramatically improve upon unsupervised methods, as shown in [51].
3. Shallow learning for flow reconstruction
Shallow learning techniques are widely used for flow reconstruction. For instance, the approximation based approach for flow reconstruction, outlined in Section 2
, can be considered to have two levels. The first level is concerned with computing an approximation basis, while the second level performs a linear weighted combination of the basis elements to estimate the highdimensional flow field. Such shallow learning techniques are easy to train and tune. In addition, the levels are often physically meaningful, and they may provide some interesting insights into the underlying mechanics of the system under consideration. However, the recent success of deep learning has put shallow learning somewhat out of focus
[8, 44, 65]. Indeed, the expressive power of deep architectures has pushed forward many tasks in computer vision and language processing. The high expressive power is due to a deep architecture design which has a large number of hidden layers. Particularly, in computer vision, the success is centered around convolution layers, which exhibit sparse connectivity of the neurons. These layers are augmented with nonlinear activation functions and additional pooling layers. Several theoretical results support some of the advantage of deeper architectures
[20, 10, 54, 53]. Thus, the reader may wonder why we advocate shallow architectures for flow reconstruction? Among other things, for the applications in which we are interested, deep learning has the following downsides:
Computation: Deep architectures require tremendous amounts of computational power. In the era of highend graphic processing units (GPU), this is somewhat less of an issue, yet training can still be costly if the input data are highdimensional.

Tuning: Regularization in its various forms can be used to ease the issue of overfitting by limiting the complexity of the network. However, in practice, this requires “fiddling” with a large number of knobs (i.e., hyperparameters) [7]. Generally, deeper architectures have been shown to be more sensitive to the choice of hyperparameters. This remains a challenge even in light of recent progress in understanding generalization and overfitting in deep networks [59, 5].

Data: The more critical issue is that deep architectures are greedy, i.e., a large number of training examples is required in order to learn a function which generalizes to new data points. This is because deep nets are typically overparametrized and tend to interpolate the data too closely [7, 48, 16]. Hence, the more examples used for training, the better the generalization error [59].
Further, there is a critical difference between standard machine learning benchmark datasets and scientific datasets (such as those we consider here). The former datasets, including MNIST, CIFAR10 and ImageNet, provide a large number of lowresolution training examples. In sharp contrast, scientific applications often generate a small amount of highdimensional (highvolume) data, yet we face the situation that labeled examples are in short supply. For instance, recordings for climate data date back only so many years. Fortunately, scientific data often feature more structure, which can render an easier learning task. Hence, the hope is that shallow learning techniques perform better in the scientific data setting. Indeed, several results show that shallow learning is better suited for limited data than deeper networks
[64, 21, 47, 39]. Interestingly, SNNs have also successfully been used for applications arising in area of fluid mechanics, both before the recent hype over deep learning [30, 55] and also more recently [33, 6].In the following, we show that the performance for flow reconstruction problems can be greatly improved by adding just one additional layer of complexity. This means that (instead of using a very shallow learning approach, as in traditional scientific methods) we explore architectures with one additional stage.
4. A shallow decoder for flow reconstruction
We can define a neural network (NN) with layers as a nested set of functions
(18) 
where denotes a coordinatewise scalar (nonlinear) activation function and denotes a set of weight matrices, , with matching dimensions. NNbased learning provides a flexible framework for estimating the relationship between quantities from a collection of samples. Here, we consider SNNs, which are considered to be networks with very few, often only one, or even no, hidden layers, i.e., is very small.
In the following, an estimate of a vector is denoted as , while denotes dummy vectors upon which one optimizes. Relying on a training set , with examples and corresponding sensor measurements , we aim to learn a function belonging to a class of neural networks which minimizes the misfit in an Euclidean sense, over all sensor measurements
(19) 
We assume that only a small number of training examples is available. Further, no prior information is assumed to be available, and the estimation method is purely datadriven. Importantly, we assume no knowledge about the underlying measurement operator which is used to collect the sensor measurements. Further, unlike most classical methods for flow reconstruction, such as those discussed in Sec. 2, this NNbased learning methodology allows the joint learning of both the modes and the coefficients.
4.1. Architecture
We now discuss some general principles guiding the design of a good network architecture for flow reconstruction. These considerations lead to the following nested nonlinear function
(20) 
The architecture design is guided by the paradigm of simplicity. Indeed, the architecture should enable fast training, little tuning, and offer an intuitive interpretation.
Recall that the interpretability of the flow field estimate is favored by representing it in a basis of moderate size, whose modes can be identified with spatial structures of the field. This means, the estimate can be represented as a linear combination of modes , weighted by coefficients , see Eq. (4). These modes are a function of the inputs. This naturally leads to consider a network in which the output is given by a linear, fully connected, last layer of inputs, interpreted as . These coefficients are informed by the sensor measurements in a nonlinear way.
The nonlinear map can be described by a hidden layer, whose outputs are hereafter termed measurement features, in analogy with kernelbased methods, where raw measurements are nonlinearly lifted as extended measurements to a higherdimensional space. In this architecture, the measurement features essentially describe nonlinear combinations of the input measurement . The nonlinear combinations are then mapped to the coefficients associated with the modes . While the size of the output layer is that of the discrete field , the size of the last hidden layer () is chosen and defines the size of the dictionary . This size can be estimated from the data by dimensionality estimation techniques [36, 29]. Restricting the description of the training data to a lowdimensional space is of potential interest to practitioners who may interpret the elements of the resulting basis in a physically meaningful way. The additional structure allows one to express the field of interest in terms of modes that practitioners may interpret, i.e., relate to some physics phenomena such as traveling waves, instability patterns (e.g., KelvinHelmholtz), etc.
In contrast, the size of the first hidden layer describing is essentially driven by the size of the input layer () and the number of nonlinear combinations used to nonlinearly inform the coefficients . The general shape of the network then bears flexibility in the hidden layers. A popular architecture for decoders consists of non decreasing layer sizes, so as to increase continuously the size of the representation from the lowdimensional observations to the highdimensional field. We can model as a shallow neural network with two hidden layers and , followed by a linear output layer .
Two types of hidden layers, namely fullyconnected (FC) and convolution layers can be considered. The power of convolution layers is key to the success of recent deep learning architectures in computer vision. However, in our problem, we favor fullyconnected layers. The reason is twofold: (i) our sensor measurements have no spatial ordering; (ii) depending on the number of filters, convolution layers require a large number of examples for training, while we assume that only a small number of examples are available for training. Thus, the first and second hidden layers take the form
and
where denotes a dense weight matrix and is a bias term. The function denotes an activation function used to introduce nonlinearity into the model as discussed below. The final linear output layer simply takes the form of
where we interpret the columns of the weight matrix as modes. In summary, the architecture of our shallow decoder can be outlined as
Depending on the dataset, we need to adjust the size of each layer. Here, we use narrow rather than wide layers. Prescribing the size of the output layer restricts the dimension of the space in which the estimation lies, and it effectively regularizes the problem, e.g., filteringout most of the noise which is not living in a lowdimensional space. Beyond robustness with respect to noise, reducing the dimension brings several additional benefits, including faster learning and fewer suboptimal local minima.
The rectified linear unit (ReLU) activation function is among the most popular choices in computer vision applications, owing to its favorable properties
[35]. The ReLU activation, illustrated in Figure 2(a), is defined as the positive part of a signal :(21) 
The transformed input signal is also called activation. While the ReLU activation function performs best on average in our experiments, there are other choices. For instance, we have considered the Swish [2] and SoftShrinkage activation function, also illustrated in Figure 3. These two activation functions can be finetuned via an additional hyperparameter and there are potential situations in which these activation functions outperform ReLU. Interestingly, different activation functions considerably affect the modes (i.e., columns of the weight matrix ), as shown in Figure 4.



4.2. Regularization
Overfitting is a common problem in machine learning and occurs if a function interpolates a limited set of data points too closely. In particular, this is a problem for deep neural networks which often have more neurons (trainable parameters) than can be justified by the limited amount of training examples which are available. There is increasing interest in characterizing and understanding generalization and overfitting in NNs [59, 5]. Hence, additional constraints are required to learn a function which generalizes to new observations that have not been used for training. Standard strategies to avoid overfitting include early stopping rules, and weight penalties (
regularization) to regularize the complexity of the function (network). In addition to these two strategies, we use also batch normalization (BN)
[40] and dropout layers (DL) [66] to improve the convergence and robustness of the shallow decoder. This yields the following architecture:Regularization, in its various forms, requires one to “fiddle” with a large number of knobs (i.e., hyperparameters). However, we have found that SNNs are less sensitive to the particular choice of parameters; hence, SNNs are easier to tune.
Batch normalization.
BN is a technique to normalize (mean zero and unit standard deviation) the activation. From a statistical perspective, BN eases the effect of internal covariate shifts
[40]. In other words, BN accounts for the change of distribution of the output signals (activation) across different mini batches during training. Each BN layer has two parameters which are learned during the training stage. This simple, yet effective, prepossessing step allows one to use higher learning rates for training the network. In addition it also reduces overfitting owing to its regularization effect.Dropout layer.
DL helps to improve the robustness of a NN. The idea is to switch off (drop) a small fraction of randomly chosen hidden units (neurons) during the training stage. This strategy can be seen as some form of regularization which also helps to reduce interdependent learning between the units of a fully connected layer. In our experiments the drop ratio is set to .
4.3. Optimization
Given a training set with targets and corresponding sensor measurements , we minimize the misfit between the reconstructed quantity and the observed quantity , in terms of the squared norm
The second term on the right hand side introduces regularization to the weight matrices, which is controlled via the parameter . It is wellknown that
norm is sensitive to outliers; and the
norm can be used as a more robust loss function. Alternatively, a popular option is the Huber norm (smooth
loss), leading to the following optimization problemwhere
The tuning parameter controls the threshold. The Huber loss functions grow at a linear rate for residuals outside the thresholding parameter , rather than quadratically. This can reduce the influence of large deviations when learning the decoder. Further, it has been reported that this loss function prevents exploding gradients in some cases [34]. Thus, the Huber loss may be an interesting alternative.
We use the ADAM optimization algorithm [43] to train the shallow decoder, with learning rate and weight decay (also known as
regularization). The learning rate, also known as step size, controls how much we adjust the weights in each epoch. The weight decay parameter is important since it allows one to regularize the complexity of the network. In practice, we can improve the performance by changing the learning rate during training. We decay the learning rate by a factor of
after epochs. Indeed, the reconstruction performance in our experiments is considerably improved by this dynamic scheme, compared to a fixed parameter setting. In addition, we decrease the weight decay by a factor of . Further, we use a relatively large batch size, since we have only a limited amount of data available for training. Overall, in our experiments, ADAM shows a better performance than stochastic gradient decent (SGD) with momentum [67] and averaged SGD [60]. The hyperparameters can be fine tuned in practice, but our choice of parameters works reasonably well for several different examples. Note that we use the method described by [38] in order to initialize the weights. This initialization scheme is favorable, in particular because the output layer is highdimensional.5. Empirical evaluation
We evaluate our methods on three classes of data. First, we consider a periodic flow behind a circular cylinder, as a canonical example of fluid flow. Then, we consider the weekly mean sea surface temperature (SST), as a second and more challenging example. Finally, the third and most challenging example we consider is a forced isotropic turbulence flow.
As discussed in Section 1, the shallow decoder requires that the training data represent the system, in the sense that they should comprise samples drawn from the same statistical distribution as the testing data. Indeed, this limitation is standard to datadriven methods, both for flow reconstruction and also more generally. Hence, we are mainly concerned with exploring reconstruction performance and generalizability for interpolation tasks rather than for extrapolation tasks. In our third example, however, we demonstrate the limitations of the the shallow decoder, illustrating difficulties that arise when one tries to extrapolate, rather than interpolate, the flow field. Figure 5 illustrates the difference between the two types of tasks.
In the first two example classes of data, the sensor information is a subset of the highdimensional flow field, i.e., the measurement operator only has one nonzero entry in rows corresponding to the index of a sensor location. Letting be the set of indices indexing the spatial location of the sensors, the measurement operator is such that
(22) 
that is, the observations are simply pointwise measurements of the field of interest. In the above equation, is the restriction of to its rows indexed by . In this paper, no attempt is made to optimize the location of the sensors. In practical situations, they are often given or constrained by other considerations (wiring, intrusivity, manufacturing, etc.). We use simply uniform random locations in our examples. The third example class of data demonstrates the SD using subgridscale measurements.
The quality of the reconstruction accuracy is quantified in terms of the normalized rootmeansquare residual error
(23) 
denoted in the following as “NME.” However, this measure can be misleading if the empirical mean is dominating. Hence, we consider also a more sensitive measure which quantifies the reconstruction accuracy of the deviations around the empirical mean. We define this measure as
(24) 
where and are the fluctuating parts around the empirical mean. In our experiments, we average the errors over runs for different sensor distributions.
5.1. Setup for our empirical evaluation
Here, we provide details about the concrete network architectures of the shallow decoder, which are used for the different examples. The networks are implemented in Python using PyTorch; and research code for flow behind the cylinder is available via
https://github.com/erichson/ShallowDecoder. Tables 1– 3 show the details. For each example we use a similar architecture design. The difference is that we use a slightly wider design (more neurons per layer) for the SST dataset and the isotropic flow. That is because we are using a larger number of sensors for these two problems, and thus we need to increase the capacity of the network. In each situation, the learning rate is set to with a decay rate of . The weight decay is set to with a decay rate of . We stop training after epochs.Layer  Weight size  Input Shape  Output Shape  Activation  Batch Norm.  Dropout 

FC  sensors 35  sensors  35  ReLU  True   
FC  35 40  25  40  ReLU  True   
FC  40 76,416  40  76,416  Linear     
Layer  Weight size  Input Shape  Output Shape  Activation  Batch Norm.  Dropout 

FC  sensors 350  sensors  350  ReLU  True  
FC  350 400  350  400  ReLU  True   
FC  400 44,219  400  44,219  Linear     
Layer  Weight size  Input Shape  Output Shape  Activation  Batch Norm.  Dropout 

FC  sensors 350  sensors  350  ReLU  True  
FC  350 400  350  400  ReLU  True   
FC  400 122,500  400  122,500  Linear     
5.2. Fluid flow behind cylinder
The first example we consider is the fluid flow behind a circular cylinder, at Reynolds number , based on cylinder diameter, a canonical example in fluid dynamics [57]. The flow is characterized by a periodically shedding wake structure and exhibits smooth, large scale, patterns. A direct numerical simulation of the twodimensional NavierStokes equations is achieved via the immersed boundary projection method [68, 19]. In particular, we use the fast multidomain method [19], which simulates the flow on five nested grids of increasing size, with each grid consisting of grid points, covering a domain of cylinder diameters on the finest domain. We collect snapshots in time, sampled uniformly in time and covering several periods of vortex shedding. For the following experiment, we use cropped snapshots of dimension on the finest domain, as we omit the spatial domain upstream to the cylinder. Further, we split the dataset into a training and test set so that the training set comprises the first snapshots, while the remaining snapshots are used for validation. Note that different splittings (interpolation and extrapolation) yield nearly the same results since the flow is periodic.
5.2.1. Varying numbers of random structured pointwise sensor measurements
We investigate the performance of the shallow decoder using varying numbers of sensors. A realistic setting is considered in that the sensors can only be located on a solid surface. The retained configuration aims at reconstructing the entire vorticity flow field from information at the cylinder surface only. The results are averaged over different sensor distributions on the cylinder downstreamfacing surface and are summarized in Table 4. Further, to contextualize the precision of the algorithms, we also state the standard deviation in parentheses.
Sensors  Training Set  Test Set  

NME  NFE  NME  NFE  
pod  2  0.310 (0.01)  0.449 (0.01)  0.316 (0.01)  0.452 (0.01) 
pod plus  2  0.309 (0.01)  0.449 (0.01)  0.315 (0.30)  0.451 (0.01) 
shallow decoder  2  0.004 (0.00)  0.006 (0.00)  0.007 (0.00)  0.011 (0.00) 
pod  5  0.465 (0.39)  0.675 (0.57)  0.488 (0.41)  0.698 (0.59) 
pod plus  5  0.204 (0.04)  0.297 (0.05)  0.212 (0.04)  0.303 (0.06) 
shallow decoder  5  0.003 (0.00)  0.004 (0.00)  0.006 (0.00)  0.008 (0.00) 
pod  10  0.346 (1.54)  0.502 (2.23)  0.379 (1.70)  0.542 (2.43) 
pod plus  10  0.041 (0.02)  0.059 (0.02)  0.040 (0.01)  0.057 (0.02) 
shallow decoder  10  0.002 (0.00)  0.003 (0.00)  0.005 (0.00)  0.007 (0.00) 
pod  15  0.441 (1.81)  0.639 (2.63)  0.574 (2.44)  0.821 (3.49) 
pod plus  15  0.021 (0.01)  0.031 (0.01)  0.021 (0.01)  0.029 (0.01) 
shallow decoder  15  0.002 (0.00)  0.003 (0.00)  0.005 (0.00)  0.007 (0.00) 
The shallow decoder shows an excellent flow reconstruction performance compared to traditional methods. Indeed, the results show that very few sensors are already sufficient to get an accurate approximation. Further, we can see that the shallow decoder is insensitive to the sensor location, i.e., the variability of the performance is low when different sensor distributions on the cylinder surface are used. In stark contrast, this simple setup poses a challenge for the pod method, which is seen to be highly sensitive to the sensor configuration. This is expected since poorly located sensors lead to a large probability that the vorticity field lies in the nullspace of , preventing its estimation, as discussed in Section 2. While regularization can improve the robustness slightly, the pod plus approach still requires about at least sensors to provide accurate estimations for the highdimensional statespace of the flow where the shallow decoder exhibits a good performance with as few as 5 sensors. Note that the traditional methods could benefit from optimal sensor placement [50]; however, this is beyond the scope of this paper.
Figure 6 provides visual results for two specific sensor configuration using sensors. The second configuration is challenging for pod, which fails to provide an accurate reconstruction. pod plus provides a more accurate reconstruction of the flow field. The shallow decoder outperforms the traditional methods in both situations. Further insights can be gained by examining the singular value spectrum of the original and reconstructed data constituted of a matrix collecting snapshots at different time instants; see Figure 7. The spectrum of the reconstructed flow data using the shallow decoder is seen to closely approximate the true spectrum, while the spectrum of the data reconstructed using pod show provides a very poor approximation.
5.2.2. Nonlinear sensor measurements
So far, the sensor information consisted of pointwise measurements of the local flow field so that the th measurement is given by , , with a Dirac distribution centered at the location of the th sensor and and the th component of and respectively. We now consider nonlinear measurements to demonstrate the flexibility of the shallow decoder. Here, we consider the simple setting of squared sensor measurements: , where denotes the Hadamard product. Table 5 provides a summary of the results, using sensors. The shallow decoder is agnostic to the functional form of the sensor measurements, and it achieves nearly the same performance as in the linear case above. The average reconstruction accuracy for the test set increases only by about . The PODbased methods fail for this task since they are linear techniques.
Sensors  Training Set  Test Set  

NME  NFE  NME  NFE  
pod  10         
pod plus  10  0.781 (0.06)  1.134 (0.09)  0.609 (0.02)  0.871 (0.03) 
shallow decoder  10  0.002 (0.00)  0.003 (0.00)  0.006 (0.00)  0.009 (0.01) 
5.2.3. Noisy sensor measurements
Visual results for the flow past the cylinder in presence of white noise. Here the signaltonoise ratio is
. In (a) the target snapshot and the corresponding sensor configuration (using sensors) is shown. Both, pod and pod plus are not able to reconstruct the flow field, as shown in (b) and (d). The SD is able to reconstruct the coherent structure of the flow field, as shown in (d).SNR  Training Set  Test Set  

NME  NFE  NME  NFE  
pod  10  9.171 (14.7)  12.69 (20.4)  8.746 (12.9)  11.93 (17.6) 
pod plus  10  0.511 (0.03)  0.742 (0.05)  0.551 (0.04)  0.679 (0.06) 
shallow decoder  10  0.138 (0.02)  0.201 (0.02)  0.278 (0.04)  0.397 (0.05) 
pod  50  4.837 (3.08)  6.946 (4.42)  4.520 (2.75)  6.390 (3.89) 
pod plus  50  0.370 (0.04)  0.531 (0.05)  0.364 (0.02)  0.514 (0.02) 
shallow decoder  50  0.134 (0.02)  0.198 (0.02)  0.173 (0.02)  0.247 (0.03) 
To investigate further the robustness and flexibility of the shallow decoder, we consider flow reconstruction in the presence of additive white noise. While this is not of concern when dealing with flow simulations, it is a realistic setting when dealing with flows obtained in experimental studies. Table 6 lists the results for both a high and low noise situation with linear measurements. By inspection, the performance of the shallow decoder outperforms classical techniques. In the high noise case, with a signaltonoise ratio (SNR) of , the average relative reconstruction error for the test set is about for the shallow decoder. For a SNR of , the relative error is as low as . Note that we here use an additional dropout layer (placed after the first fullyconnected layer) to improve the robustness of the shallow decoder. In contrast, standard pod fails in both situations. Again, the pod plus method shows improved results over the standard pod. However, the visual results in Figure 8 show that the reconstruction quality of the shallow decoder is favorable. The shallow decoder shows a clear advantage and a denoising effect. Indeed the reconstructed snapshots allow for a meaningful interpretation of the underlying structure. The shallow decoder can thus be seen as a valuable tool for the reconstruction of fluid flows in the presence of noise.
5.2.4. Summary of empirical results for the flow behind cylinder
Figure 9 summarizes the performance of the shallow decoder for varying measurement configurations (number of sensors, linear or nonlinear, noise). The advantage of the shallow decoder compared to the traditional POD based techniques is pronounced. It can be seen that the performance of the traditional techniques is patchy, i.e., the reconstruction quality is highly sensitive to the sensor location. While regularization can mitigate a poor sensor placement design, a relatively larger number () of sensors is required in order to achieve an accurate reconstruction performance. More challenging situations such as nonlinear measurements and sensor noise pose a challenge for the traditional techniques, while the shallow decoder shows to be able to reconstruct dominant flow features in such situations. The computational demands required to train the shallow decoder are minimal, compared to training deep architectures, i.e., the time for training remains below two minutes for this example, using a modern GPU.
5.3. Sea surface temperature using random pointwise measurements
The second example we consider is the more challenging sea surface temperature (SST) dataset. Complex ocean dynamics lead to rich flow phenomena, featuring interesting seasonal fluctuations. While the mean SST flow field is characterized by a periodic structure, the flow is nonstationary. The dataset consists of the weekly sea surface temperatures for the last 26 years, publicly available from the National Oceanic & Atmospheric Administration (NOAA).^{1}^{1}1The dataset can be obtained at http://www.esrl.noaa.gov/psd/. The data comprise snapshots in time with spatial resolution of . For the following experiments, we only consider measurements, by excluding measurements corresponding to the land masses. Further, we create a training set by selecting snapshots at random, while the remaining snapshots are used for validation.
We consider the performance of the shallow decoder using varying numbers of random sensors scattered across the spatial domain. The results are summarized in Table 7. We observe a large discrepancy between the NME and NFE error. This is because the longterm annual mean field accounts for the majority of the spatial structure of the field. Hence, the NME error is uninformative with respect to the performance of reconstruction methods. In terms of the NFE error the POD based reconstruction techniques is shown to fail to reconstruct the highdimensional flow field using limited sensor measurements. In contrast, the shallow decoder demonstrates an excellent reconstruction performance both using and measurements. Figure 10 shows visual results to support these quantitative findings.
Sensors  Training Set  Test Set  

NME  NFE  NME  NFE  
pod  32  0.637 (0.59)  5.915 (5.56)  0.649 (0.62)  6.04 (5.77) 
pod plus  32  0.293 (0.11)  2.728 (1.05)  0.299 (0.12)  2.783 (1.14) 
shallow decoder  32  0.009 (0.00)  0.088 (0.00)  0.014 (0.00)  0.128 (0.00) 
pod  64  0.986 (1.34)  9.183 (12.5)  1.007 (1.36)  9.344 (12.7) 
pod plus  64  0.229 (0.07)  2.132 (0.66)  0.257 (0.87)  2.389 (0.81) 
shallow decoder  64  0.009 (0.00)  0.085 (0.00)  0.012 (0.00)  0.118 (0.00) 
5.4. Turbulent flow using subgridscale measurements
The final example we consider is the velocity field of a turbulent isotropic flow.
If the sensor measurements are acquired on a coarse but regular grid, then the reconstruction task may be considered as a superresolution problem [72, 31, 14]. There are a number of direct applications of superresolution in fluid mechanics centered around subgridscale modeling. Because many fluid flows are inherently multiscale, it may be prohibitively expensive to collect data that captures all spatial scales, especially for iterative optimization and realtime control [12]. Inferring smallscale flow structures below the spatial resolution available is an important task in large eddy simulation (LES), climate modeling, and particle image velocimetry (PIV), to name a few applications. Deep learning has recently been employed for superresolution in fluid mechanics applications with promising results [32].
Here, we consider data from a forced isotropic turbulence flow generated with a direct numerical simulation using points in a triply periodic domain. For the following experiments, we are using snapshots for training and snapshots for validation. The data spread accross about one largeeddy turnover time. The full dataset is provided as part of the Johns Hopkins Turbulence Database [45, 58].
Grids  Training Set  Test Set  

NME  NFE  NME  NFE  
shallow decoder  36  0.029 (0.00)  0.041 (0.00)  0.071 (0.00)  0.101 (0.01) 
shallow decoder  64  0.027 (0.00)  0.039 (0.00)  0.067 (0.00)  0.096 (0.00) 
shallow decoder  121  0.026 (0.00)  0.038 (0.00)  0.066 (0.00)  0.093 (0.00) 
5.4.1. Interpolation
Unlike the examples considered in Section 5, the isotropic turbulent flow is nonperiodic in time and highly nonstationary. Thus, this dataset poses a challenging task, even for interpolation. Figure 11 shows visual example for this problem. Note that in our setting the mean grid values are used as inputs, while in the classical superresolution problem the lowresolution image, shown in Figure 10(b), is used as an input. The quality of the estimated highdimensional flow field is excellent, despite the challenging problem. Table 8 quantifies the performance for varying numbers of subgridscale measurements.
5.4.2. Extrapolation



Next, we illustrate the limitation of the shallow decoder. Indeed, it is important to stress that the shallow decoder cannot be used for extrapolating highly nonstationary fluid flows. To illustrate this issue, Figure 12 shows three flow fields at different temporal locations. First, Figure 11(b) shows a test example, which is close in time to the training set. In this case, the shallow decoder is able to extrapolate the flow field with high accuracy. The reconstruction quality drops for snapshots which are further away in time, as shown in Figure 11(d). Finally, Figure 11(f) shows that extrapolation fails if the test example is far away from the training set in time, i.e., the flow field is not drawn from the same statistical distribution as the training examples are.
6. Discussion
The emergence of sensor networks for global monitoring (e.g., ocean and atmospheric monitoring) requires new mathematical techniques that are capable of maximally exploiting sensors for state estimation and forecasting. Emerging algorithms from the machine learning community can be integrated with many traditional scientific computing approaches to enhance sensor network capabilities. For many global monitoring applications, the placement of sensors can be prohibitively expensive, thus requiring mathematical techniques such as the one proposed here, which can exploit a hyper reduction in the number of sensors while maintaining required performance characteristics.
This work demonstrates the enhanced robustness and accuracy of fluid flow field reconstruction by using a shallow learning based methodology. We have explored this approach on a range of example flow fields of increasing complexity. The mathematical formulation presented is significantly different from what is commonly used in flow reconstruction problems, e.g., gappy interpolation with dominant POD modes.
We proposed a shallow decoder with two hidden layer for the problem of flow reconstruction in order to achieve an improved reconstruction performance. During our study, we also compared the shallow decoder to deep architectures, which yield no significant improvement (results are omitted). More concretely, we considered deep convolution networks (DCN) with three to five hidden deconvolutional layers, as well as residual network (resnet) architectures. The reconstruction performance was marginally better for the simple flow behind the cylinder. However, we observed that the shallow decoder
shows a favorable performance in all other situations. A further advantage we observed in our experiments is that shallow architectures are more robust to sensor noise. Moreover, the features extracted for reconstruction are highly interpretable, potentially allowing for enhanced scientific understanding of the system measured.
Table 9 shows a qualitative comparison of our initial experiments for flow reconstruction. Our results show that shallow architectures are more favorable for limited sensor and limited data settings. In conclusion, we advocate a regression towards more shallow networks for flow reconstruction and more generally for scientific applications with limited data.
Future work aims to leverage the underlying laws of physics in flow problems to further improve the efficiency. In the context of flow reconstruction or, more generally, observation of a highdimensional physical system, insights from the physics at play can be exploited [62, 61]. In particular, the dynamics of many systems do indeed remain lowdimensional and the trajectory of their state vector lies close to manifold whose dimension is significantly lower than the ambient dimension. Moreover, the features exploited from the shallow decoder network can also be integrated in reduced order models (ROMs) for forecasting predictions [9]. In many highdimensional systems where ROMs are used, the ability to generate lowfidelity models that can be rapidly simulated has revolutionized our ability to model such complex systems, especially in application of complex flow fields. The ability to rapidly generate alternative lowrank feature spaces to POD generates new possibilities for ROMs using limited sampling and limited data. This aspect of the shallow decoder will be explored further in future work.
very shallow  (our) shallow  deeper  

Computational demands:  low  medium  high 
Time for hyperparameter tuning:  low  medium  high 
Complexity of architecture design:  low  medium  high 
Ability to learn with limited data:  high  high  low 
Inference time:  low  low  high 
Acknowledgments
LM gratefully acknowledges the support of the French Agence Nationale pour la Recherche (ANR) and Direction Générale de l’Armement (DGA) via the FlowCon project (ANR17ASTR0022). SLB acknowledges support from the Army Research Office (ARO W911NF1710422). JNK acknowledges support form the Air Force Office of Scientific Research (FA95501910011). LM and JNK also acknowledge support from the Air Force Office of Scientific Research (FA95501710329). MWM would like to acknowledge ARO, DARPA, NSF, and ONR for providing partial support for this work. We would also like to thank Kevin Carlberg for valuable discussions about flow reconstruction techniques.
References
 [1] Jonas Adler and Ozan Öktem. Solving illposed inverse problems using iterative deep neural networks. Inverse Problems, 33(12):124007, 2017.
 [2] Forest Agostinelli, Matthew Hoffman, Peter Sadowski, and Pierre Baldi. Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830, 2014.
 [3] Richard G Baraniuk. Compressive sensing. IEEE signal processing magazine, 24(4):118–121, 2007.

[4]
Maxime Barrault, Yvon Maday, Ngoc Cuong Nguyen, and Anthony T Patera.
An “empirical interpolation’ method: application to efficient reducedbasis discretization of partial differential equations.
Comptes Rendus Mathematique, 339(9):667–672, 2004.  [5] Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrallynormalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pages 6240–6249, 2017.
 [6] M Baymani, Sohrab Effati, Hamid Niazmand, and Asghar Kerayechian. Artificial neural network method for solving the navier–stokes equations. Neural Computing and Applications, 26(4):765–773, 2015.
 [7] Mikhail Belkin, Siyuan Ma, and Soumik Mandal. To understand deep learning we need to understand kernel learning. arXiv preprint arXiv:1802.01396, 2018.
 [8] Yoshua Bengio. Learning deep architectures for AI. Foundations and trends® in Machine Learning, 2(1):1–127, 2009.
 [9] Peter Benner, Serkan Gugercin, and Karen Willcox. A survey of projectionbased model reduction methods for parametric dynamical systems. SIAM review, 57(4):483–531, 2015.

[10]
Monica Bianchini and Franco Scarselli.
On the complexity of shallow and deep neural network classifiers.
In ESANN, 2014.  [11] Thomas Bolton and Laure Zanna. Applications of deep learning to ocean data inference and subgrid parameterization. Journal of Advances in Modeling Earth Systems, 0(0).
 [12] S. L. Brunton and B. R. Noack. Closedloop turbulence control: Progress and challenges. Applied Mechanics Reviews, 67:050801–1–050801–48, 2015.
 [13] Tan BuiThanh, Murali Damodaran, and Karen E Willcox. Aerodynamic data reconstruction and inverse design using proper orthogonal decomposition. AIAA journal, 42(8):1505–1516, 2004.
 [14] J. Callaham, K. Maeda, and S. L. Brunton. Robust reconstruction of flow fields from limited measurements. arXiv preprint arXiv:1810.06723, 2018.
 [15] Emmanuel J Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2):489–509, 2006.
 [16] Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678, 2016.
 [17] Kevin T Carlberg, Antony Jameson, Mykel J Kochenderfer, Jeremy Morton, Liqian Peng, and Freddie D Witherden. Recovering missing cfd data for highorder discretizations using deep neural networks and dynamics learning. arXiv preprint arXiv:1812.01177, 2018.
 [18] Saifon Chaturantabut and Danny C Sorensen. Nonlinear model reduction via discrete empirical interpolation. SIAM Journal on Scientific Computing, 32(5):2737–2764, 2010.
 [19] T. Colonius and K. Taira. A fast immersed boundary method using a nullspace approach and multidomain farfield boundary conditions. Computer Methods in Applied Mechanics and Engineering, 197:2131–2146, 2008.
 [20] Olivier Delalleau and Yoshua Bengio. Shallow vs. deep sumproduct networks. In Advances in Neural Information Processing Systems, pages 666–674, 2011.
 [21] Sounak Dey, Anjan Dutta, Josep Lladós, Alicia Fornés, and Umapada Pal. Shallow neural network model for handdrawn symbol recognition in multiwriter scenario. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pages 31–32. IEEE, 2017.
 [22] David L Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306, 2006.
 [23] P. Drineas and M. W. Mahoney. RandNLA: Randomized numerical linear algebra. Communications of the ACM, 59:80–90, 2016.
 [24] Zlatko Drmac and Serkan Gugercin. A new selection operator for the discrete empirical interpolation method—improved a priori error bound and extensions. SIAM Journal on Scientific Computing, 38(2):A631–A648, 2016.
 [25] N Benjamin Erichson, Lionel Mathelin, Steven L Brunton, and J Nathan Kutz. Randomized dynamic mode decomposition. arXiv preprint arXiv:1702.02912, 2017.
 [26] N Benjamin Erichson, Sergey Voronin, Steven L Brunton, and J Nathan Kutz. Randomized matrix decompositions using r. arXiv preprint arXiv:1608.02148, 2016.
 [27] N Benjamin Erichson, Peng Zeng, Krithika Manohar, Steven L Brunton, J Nathan Kutz, and Aleksandr Y Aravkin. Sparse principal component analysis via variable projection. arXiv preprint arXiv:1804.00341, 2018.
 [28] Richard Everson and Lawrence Sirovich. Karhunen–Loeve procedure for gappy data. JOSA A, 12(8):1657–1664, 1995.
 [29] Elena Facco, Maria d’Errico, Alex Rodriguez, and Alessandro Laio. Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific Reports, 7, 2017.
 [30] William E Faller and Scott J Schreck. Unsteady fluid mechanics applications of neural networks. Journal of aircraft, 34(1):48–55, 1997.
 [31] William T Freeman, Thouis R Jones, and Egon C Pasztor. Examplebased superresolution. IEEE Computer graphics and Applications, 22(2):56–65, 2002.
 [32] Kai Fukami, Koji Fukagata, and Kunihiko Taira. Superresolution reconstruction of turbulent flows with machine learning. arXiv preprint arXiv:1811.11328, 2018.
 [33] Azadeh Gholami, Hossein Bonakdari, Amir Hossein Zaji, and Ali Akbar Akhtari. Simulation of open channel bend characteristics using computational fluid dynamics and artificial neural networks. Engineering Applications of Computational Fluid Mechanics, 9(1):355–369, 2015.
 [34] Ross Girshick. Fast rcnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.

[35]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio.
Deep sparse rectifier neural networks.
In
Proceedings of the fourteenth international conference on artificial intelligence and statistics
, pages 315–323, 2011.  [36] Daniele Granata and Vincenzo Carnevale. Accurate estimation of the intrinsic dimension using graph distances: unraveling the geometric complexity of datasets. Scientific Reports, 6, 2016.
 [37] Nathan Halko, PerGunnar Martinsson, and Joel A Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
 [38] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
 [39] PoSen Huang, Haim Avron, Tara N Sainath, Vikas Sindhwani, and Bhuvana Ramabhadran. Kernel methods match deep neural networks on timit. In ICASSP, pages 205–209, 2014.
 [40] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.

[41]
Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael Unser.
Deep convolutional neural network for inverse problems in imaging.
IEEE Transactions on Image Processing, 26(9):4509–4522, 2017.  [42] Byungsoo Kim, Vinicius C Azevedo, Nils Thuerey, Theodore Kim, Markus Gross, and Barbara Solenthaler. Deep fluids: A generative network for parameterized fluid simulations. arXiv preprint arXiv:1806.02071, 2018.
 [43] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [44] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
 [45] Yi Li, Eric Perlman, Minping Wan, Yunke Yang, Charles Meneveau, Randal Burns, Shiyi Chen, Alexander Szalay, and Gregory Eyink. A public turbulence database cluster and applications to study lagrangian evolution of velocity increments in turbulence. Journal of Turbulence, (9):N31, 2008.
 [46] Julia Ling, Andrew Kurzawski, and Jeremy Templeton. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics, 807:155–166, 2016.
 [47] Zhiyun Lu, Avner May, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, et al. How to scale up kernel methods to be as good as deep neural nets. arXiv preprint arXiv:1411.4000, 2014.
 [48] Siyuan Ma, Raef Bassily, and Mikhail Belkin. The power of interpolation: Understanding the effectiveness of sgd in modern overparametrized learning. arXiv preprint arXiv:1712.06559, 2017.
 [49] Michael W Mahoney. Randomized algorithms for matrices and data. Foundations and Trends® in Machine Learning, 3(2):123–224, 2011.
 [50] Krithika Manohar, Bingni W Brunton, J Nathan Kutz, and Steven L Brunton. Datadriven sparse sensor placement for reconstruction: Demonstrating the benefits of exploiting known patterns. IEEE Control Systems, 38(3):63–86, 2018.
 [51] L. Mathelin, K. Kasper, and H. AbouKandil. Observable dictionary learning for highdimensional statistical inference. Archives Comput. Meth. Eng., 25(1):103–120, 2017. ArXiv 1702.05289.
 [52] Michael T McCann, Kyong Hwan Jin, and Michael Unser. A review of convolutional neural networks for inverse problems in imaging. arXiv preprint arXiv:1710.04011, 2017.
 [53] Hrushikesh Mhaskar, Qianli Liao, and Tomaso A Poggio. When and why are deep networks better than shallow ones? In AAAI, pages 2343–2349, 2017.
 [54] Hrushikesh N Mhaskar and Tomaso Poggio. Deep vs. shallow networks: An approximation theory perspective. Analysis and Applications, 14(06):829–848, 2016.
 [55] Michele Milano and Petros Koumoutsakos. Neural network modeling for near wall turbulent flow. Journal of Computational Physics, 182(1):1–26, 2002.
 [56] A. Mousavi and R. G. Baraniuk. Learning to invert: Signal recovery via deep convolutional networks. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pages 2272–2276. IEEE, 2017.
 [57] B. R. Noack, K. Afanasiev, M. Morzynski, G. Tadmor, and F. Thiele. A hierarchy of lowdimensional models for the transient and posttransient cylinder wake. Journal of Fluid Mechanics, 497:335–363, 2003.
 [58] Eric Perlman, Randal Burns, Yi Li, and Charles Meneveau. Data exploration of turbulence simulations using a database cluster. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, page 23. ACM, 2007.
 [59] Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, and Hrushikesh Mhaskar. Theory of deep learning iii: explaining the nonoverfitting puzzle. arXiv preprint arXiv:1801.00173, 2017.
 [60] B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30(4):838–855, July 1992.
 [61] Maziar Raissi and George Em Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential equations. Journal of Computational Physics, 357:125–141, 2018.
 [62] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part i): Datadriven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561, 2017.
 [63] Clarence W Rowley and Scott TM Dawson. Model reduction for flow analysis and control. Annual Review of Fluid Mechanics, 49:387–417, 2017.
 [64] Alexander Schindler, Thomas Lidy, and Andreas Rauber. Comparing shallow versus deep neural network architectures for automatic music genre classification. In 9th Forum Media Technology (FMT2016), volume 1734, pages 17–21, 2016.
 [65] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
 [66] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
 [67] Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. In ICML, pages 1139–1147, 2013.
 [68] K. Taira and T. Colonius. The immersed boundary method: a projection approach. Journal of Computational Physics, 225(2):2118–2137, 2007.
 [69] Jonathan Tompson, Kristofer Schlachter, Pablo Sprechmann, and Ken Perlin. Accelerating eulerian fluid simulation with convolutional networks. arXiv preprint arXiv:1607.03597, 2016.

[70]
Pantelis R Vlachas, Wonmin Byeon, Zhong Y Wan, Themistoklis P Sapsis, and
Petros Koumoutsakos.
Datadriven forecasting of highdimensional chaotic systems with long shortterm memory networks.
Proc. R. Soc. A, 474(2213):20170844, 2018.  [71] Karen Willcox. Unsteady flow sensing and estimation via the gappy proper orthogonal decomposition. Computers & fluids, 35(2):208–226, 2006.
 [72] Jianchao Yang, John Wright, Thomas S Huang, and Yi Ma. Image superresolution via sparse representation. IEEE transactions on image processing, 19(11):2861–2873, 2010.
 [73] Jong Chul Ye, Yoseob Han, and Eunju Cha. Deep convolutional framelets: A general deep learning framework for inverse problems. SIAM J. Imag. Sci., 11(2):991–1048, 2018.
 [74] Jian Yu and Jan S Hesthaven. Flowfield reconstruction method using artificial neural network. Aiaa Journal, pages 1–17, 2018.
 [75] YT Zhou, Rama Chellappa, Aseem Vaid, and B Keith Jenkins. Image restoration using a neural network. IEEE Trans. Acous., Speech, & Sig. Proc., 36(7):1141–1151, 1988.
Comments
There are no comments yet.