## 1 Introduction

State estimation is the ability to recover flow based on a few measurements. It is an inverse problem and arises in many engineering applications such as remote sensing, medical imaging, ocean dynamics, reservoir modeling, and blood flow modeling. Uses of fluid estimation include flow control [Cordier2011, Semaan], cardiac blood flow modeling [pmid22539149, pmid18092738, KISSAS2], ship wake identification [GRAZIANO201672], climate prediction [kalnay_2002], optimizing machine design for low-drag vehicles, efficient turbo-machines, etc. Few challenges faced in the processes are limited sensors, sparse label data, moving sensors, ill-posed problems, noisy measurements, etc. This work focuses on learning from moving sparse label data and sensor measurements with a deep learning-based model using an implicit optimization layer for network training. Sparse fluid is encountered in various situations. One reason is that storing high-resolution data generated during direct numerical simulation is challenging due to limited storage space. It makes analysis, sharing, and visualization difficult. Another important reason is that real data is hard to measure on a full scale, like cardiovascular blood flow data obtained from flow magnetic resonance imaging (MRI) [Ong, pmid27885707, pmid30423501].

For high dimensional state estimation problems like fluids, popular approaches include library-based approaches observer dynamical system stochastic approaches. Library-based methods use offline data, and the library consists of generic modes such as Fourier, wavelet, discrete cosine transform basis, or data specific Proper orthogonal decomposition (POD) or Dynamic mode decomposition (DMD) modes, or training data. Library-based approaches using sparse representation assume state can be expressed as the combination of library elements. In an observer dynamical system, we assume the system’s dynamics to produce a full state and update it based on new measurements to reduce estimation error forming a closed feedback loop. The estimate is maintained by Kalman filtering

[bishop2006pattern, reif1999stochastic, wan2001unscented]. Jonathan applied dynamic mode decomposition [2008AP, rowley_mez] as a reduced-order model to Kalman smoother estimate to identify coherent structures. BUFFONI20082626 used a nonlinear observer-based on Galerkin projection of Navier-Stokes equation to estimate POD coefficients. Stochastic estimation was proposed by Adrian for a turbulence study where the conditional mean was approximated using a power series. Extension to this method [Guezennec, Ewing, Naguib] can be found in the literature. Bonnet1994 extended stochastic approach to estimate POD coefficients. A linear mapping between sensors and coefficients was assumed. These approaches allow more flexibility in sensor placements and have been applied for flow control over airfoil [Aero] and analyzing isotropic turbulence [tur, Tung].We consider a problem with sparse label data whose position may vary with time, for which POD can not be performed. Thus it restricts us from using traditional POD-based approaches. Other deep learning-based approaches like [kumar2021state, erichson2019shallow, nair_goza_2020] give accurate predictions with fewer sensors but require high-resolution labels for training. gaoHan

uses physics-based loss for super-resolution using spare data and thus assumes a fixed number of sensors and their positions. In this work, we propose an Energy network for state estimation with random sensors (ENSERS), a technique to learn a model from spare training labels capable of predicting full states given a varied number of sensors at random locations. We present two models trained using this technique. The first one produces discrete high-dimensional predictions in space. Second, produce continuous prediction utilizing the information of coordinates. We demonstrate the results corresponding to four high complexity problems: 2-dimensional (2D) coupled Burgers’ equation, transient flow, Allen–Cahn equation, and Convection-diffusion equation.

The remainder of the paper is organized as follows. In Section 2, details on the problem statement is provided. Details on the proposed approach are provided in Section 3.1. Section 3 and 4 give details on discrete and continuous formulation with two numerical examples in each to illustrate the performance of the proposed approach. Finally, Section 5 provides the concluding remarks.

## 2 Problem statement

Consider a dynamical system obtained by partial discretization of the -dimensional governing differential equations:

(1) |

The simulation time domain is discretized by steps and the space domain is discretized by segments resulting in

(2) |

where , is time step index, number of system’s state variables. e.g. represents x-velocity and represents y-velocity in 3.2 of 2D coupled Burgers’ equation. We consider sensor location and data location, represented by integers in discrete domain respectively as

(3) |

(4) |

where number of sensors, number of data nodes, , . The corresponding sensor values and data values are

(5) |

(6) |

(9) |

(10) |

Note that integer senor locations are selected randomly and are kept fixed during training. Similarly, a different set of sensor locations is selected for testing the network. In this work, we aim to use sensor data from a set of system states and produce high dimensional states. Thus we divided the sensor values and data into chunks, each with states.

(11) |

(12) |

where are sensor values for time steps, are data values for time steps, , is number of train samples, is time steps between first state

of each tensor

.## 3 Discrete space models

### 3.1 Proposed approach

In this section, we propose a novel deep learning-based framework for state estimation. Prediction of

high dimensional states is done via feed-forward neural network (FNN)

using optimized reduced state vector :(13) |

where is a third-order tensor of predicted states. The output vector of FNN has a dimension of which is reshaped into a tensor of shape . The reduced state is obtained by solving the following minimization problem using sensor data and predicted states by the neural network.

(14a) |

(14b) |

(14c) |

(14d) |

where are values of predicted states at sensor locations , is measurement matrix. Note that in practice is obtained by a few steps of gradient descent instead of global minimization. This inner optimization loop is implemented using the library ‘higher’ [grefen]

in PyTorch. Also, dimension

of the reduced state vector is quite small (e.g. 8 in first experiment 3.2) therefore the time required for gradient descent steps is negligible. The network is trained by minimizing data loss and physics-based loss across training samples .(15a) |

(15b) |

(15c) |

where are predicted states by network from Eq. (13), are values of predicted states at data locations , are predicted states composed of time steps. Fig. 1 shows the network architecture during training. Equations (13) and (14) together forms the implicit optimization layer shown in the Fig. 1. Training and testing procedure is shown in Algorithm 1 and 2 respectively. Note that for demonstration purpose, in all Algorithms a batch size of is considered but in practice batch size is selected based on problem as mentioned in each experiment section. Also we use Huber loss function during testing as it is more robust to noise than mean squared error (MSE) loss function.

Methods based on implicit optimization layers come under the category of Optimization-based Modeling architectures [amos2019] and are well-studied for generic classification, and structured prediction tasks [Multipred, stoyanov11a, brakel13a, Lecun]. The most common way of training such models is through Unrolled Differentiation [Utama, belanger2017endtoend, Metz, stoyanov11a]. It is done by introducing an optimization procedure such as gradient descent into the inference procedure. Other ways of training include Implicit argmin differentiation using implicit function theorem but need argmin operations to be convex. This method can be found in works of [Johnson, JordanSquire]. In this work, we use Unrolled Differentiation technique for training.

#### 3.1.1 Physics-based loss function

The physics-based loss function is used in the approach because of spare training labels. Training any network with just spare labels will produce garbage values on nodes without labels. Physics-based loss functions have been used to train neural networks for solving PDEs. A popular class of methods is PINNs [RAISSI_p]. The basic idea here is to place a neural network prior to the state variable and then estimate the neural network parameters by using a physics-informed loss function. Several improvements to the originally proposed PINN can also be found in the literature. For example, ZHU201956 developed convolutional PINN for time-independent systems. GENEVA used physics constrained auto-regressive model for surrogate modeling of dynamical systems. We use Runge-Kutta methods with stages for defining loss between state predictions. Let

(16a) |

(16b) |

(16c) |

where are network prediction. General form of Runge-Kutta methods with q stages applied to Eq. (1):

(17a) |

(17b) |

We use the Implicit Runge-Kutta methods with q stages and thus parameters are chosen accordingly. Now, shifting second term on right hand side (RHS) in Eq. (17) to left hand side (LHS) and replacing exact operator with numerical gradient based operator

(18a) |

(18b) |

(18c) |

where are different estimates of , is the physics based loss function. For calculating loss the spatial gradients are approximated using Sobel filter 2D convolutions [Sobel]. See A for additional details. Note that physics-based loss function is only used during training of network.

In next section, we present two examples to show model proposed is able to learn from sparse moving data labels. We illustrate the performance of the proposed approach with plots of prediction. We show that model is robust against noisy sensor measurements by showing error corresponding to various noise level. Error used as a quantitative metric in plots in defined as

(19) |

where represents the error, , are the true state and are the predicted state using the proposed approach. represents the L2 norm.

### 3.2 Experiment: 2D coupled Burgers’ equation

As the first example, we consider the the 2D coupled Burgers’ system. It has the same convective and diffusion form as the in-compressible Navier-Stokes equations. It is an important model for understanding of various physical flows and problems, such as hydrodynamic turbulence, shock wave theory, wave processes in thermo-elastic medium, vorticity transport, dispersion in porous medium. The governing equations for Burgers’ equation takes the following form:

(20) |

with periodic boundary condition

where is viscosity, and are the and components of velocity. We consider . The initial condition is defined using truncated Fourier series with random coefficients:

(23) |

where

(24) |

where , and .

#### 3.2.1 Data-set and Model Parameters

We use FeNICS [LoggMardalEtAl2012]

computing platform to solve the partial differential equations (

22) for generating data-set. We discretize the spatial domain with grid and use a time-step of . Parameters related to data-set considered are displayed in Table 1. We use noisy measurements to test the approach. Noise level is measured in signal to noise ratio (SNRdB) in decibels(dB) and is represented by

. Signal after adding noise is formulated as:(25a) | |||

(25b) |

where, is noise-free signal,

is random variable with standard normal distribution. Plots depicting different noise levels used for testing is shown in Fig.

2.domain | |||||

5 | 2 | 50 | 2 | 3969 |

Layer | Input | Output | Activation |

FC | 8 | 64 | Softplus |

FC | 64 | 64 | Softplus |

FC | 64 | linear | |

0.0002 | 0.1 | 0.006 | 11 | 2001 | 4 | 0.005 | 0.0001 | 32 | 800 | 22 | 8 |

5 | 100 | [4, 16] | 12 |

FNN used in the model is a shallow network with three fully connected (FC) layers. We use Softplus [glorot11a]activation function which has smooth first derivatives. Thus it helps avoid discontinuities because unrolling the inference procedure involves computing . Network architecture is displayed in Table 2. We use 800 data nodes for training from each system variable data which is of high-resolution data. Training hyper-parameters and training data are summarized in table 3. For testing, we use sensor locations different from the ones used during training.

#### 3.2.2 Results and Discussions

Fig. 3 shows the prediction of Ensers for x and y velocity components at various times in simulation with 16 sensors. We see that it produces results that accurately captures the current state of the system at most points in the domain. If a network is trained without a physics-based loss function then it will produce garbage values at points where data is absent. Fig 4 shows violin plot of error vector defined in Eq. (19) between target and prediction with various noise level in sensor measurement for test case. In Eq. (19), for x velocity and y velocity are and respectively and . The plot represents the distribution of error in the prediction vector. Greater spread corresponding to a point on the y-axis corresponds to more values present in the error vector around that value of the point on the y-axis. We see the model is robust to noise as the mean error in the plot are close for cases of no noise() and high noise().

### 3.3 Experiment: Flow Past Cylinder

As the second example, we consider the flow past a cylinder problem. It is a well known canonical problem and is characterized by periodic laminar flow vortex shedding. System is governed by incompressible, laminar, Newtonian fluid equations:

(26a) |

(26b) |

(26c) |

#### 3.3.1 Data-set and Model Parameters

A schematic representation of the computational domain is shown in Fig. 5(a). The circular cylinder is considered to have a diameter of unit. The center of the cylinder is located at a distance of units from the inlet. The outlet is located at a distance of units from the center of the cylinder. The sidewalls are at units distance from the center of the cylinder. At the inlet boundary, a uniform velocity of unit along the -direction is applied. Pressure boundary condition with is considered at the outlet. A no-slip boundary at the cylinder surface is considered. Coordinate of the snapshot cutout stretches from which is discretized into points in and directions (see Fig. 5(b)).

Re | domain | ||||||

5 | 3 | 25 | 1 | 4096 | 0.005 | 200 |

The data-set is generated by using Unsteady Reynolds-averaged Navier Stokes (URANS) simulation in OpenFoam [jasak2009openfoam]. The overall problem domain is discretized into 63420 elements with finer mesh near the cylinder. Time step units is considered. Code for OpenFOAM simulation can be found at [flowOpenFoam]. Parameters related to the data-set considered are displayed in Table 5.

We use a shallow network in the model with two fully connected(FC) layers. Network architecture is displayed in Table 6. We use 500 data nodes for training from each system variable data which is of high-resolution data. Training and testing hyper-parameters are shown in table 7 and 8 respectively. Sensor locations for testing are different from the ones used during training. For this problem physics-based loss function is extended to include the continuity equation:

(27) |

Layer | Input | Output | Activation |

FC | 8 | 64 | Softplus |

FC | 64 | linear | |

0.0003 | 0.1 | 0.002 | 9 | 3001 | 5 | 0.01 | 0.0006 | 16 | 500 | 18 | 8 |

5 | 100 | [4, 16] | 12 |

#### 3.3.2 Results and Discussions

Fig. 7 shows prediction of Ensers for pressure, x and y velocity components with sensors. We see that it produces results that accurately capture the current state of the system at most points in the domain. Fig. 8 shows violin plot of error defined in Eq. (19) between target and prediction with various noise levels in sensor measurement for the test case. In Eq. (19), for x velocity, y velocity and pressure are , and respectively and . The plot represents the distribution of error in prediction vectors. We see model is robust to noise as mean error in Fig 8 are close for cases of no noise() and high noise().

## 4 Continuous space models

### 4.1 Proposed approach

In this section, we propose a novel deep learning-based continuous framework for state estimation. Prediction of high dimensional states is done via multiple passes from a feed-forward neural network (FNN) for every collocation point with coordinate vector using optimized reduced state vectors :

(28) |

(29) |

where output vector of FNN has dimension , is predicted states value at collocation point, are coordinates of collocation point. Output of FNNs are combined to form third-order tensor . Reduced state is obtained by solving the following minimization problem using sensor data and predicted states by the neural network.

(30a) |

(30b) |

(30c) |

(30d) |

(30e) |

where are values of predicted states at sensor locations . Note that in practice is obtained by a few steps of gradient descent instead of global minimization. The network is trained by minimizing data loss and physics-based loss across training samples .

(31a) |

(31b) |

(31c) |

where are predicted states from Eq. (28), are values of predicted states at data locations , are predicted states composed of time steps, are network parameters. Fig. 1 shows the network architecture during training. Equations (28) and (30) together forms the implicit optimization layer shown in the Fig. 1 for the continuous model. Training and testing procedure is shown in Algorithm 3 and 4 respectively.

#### 4.1.1 Physics-based loss function

The physics-based loss function for continuous space models differs from discrete formulation due to how RHS of Eq. (1) i.e. is evaluated. In this case, we use automatic differentiation for calculating gradients w.r.t. coordinates used in . For example in first case of Allen–Cahn equation is evaluated as:

(32) |

where is coordinate vector concatenated with optimized reduced state vector at end of inner optimization loop, see line in algorithm (3). In second case of Convection-diffusion equation is evaluated as:

(33) |

where are x and y coordinate respectively, are defined in Eq. 36.

### 4.2 Experiment: Allen–Cahn equation

We consider the Allen–Cahn equation along with periodic boundary conditions. The Allen–Cahn equation is a well-known equation from the area of reaction-diffusion systems. It describes the process of phase separation in multicomponent alloy systems, including order-disorder transitions.

(34) | |||

#### 4.2.1 Data-set and Model Parameters

Data-set is generated by simulating the Allen–Cahn equation (34) using conventional spectral methods. Starting from an initial condition and assuming periodic boundary conditions and , we integrated Eq. (34) up to a final time using the Chebfun package [driscoll2014chebfun] with a spectral Fourier discretization with 512 modes and a fourth-order explicit Runge–Kutta temporal integrator with time-step . For more details on the data-set see [RAISSI_p]. Plots depicting different noise levels used for sensor measurement during testing are shown in Fig. 9.

domain | |||||

5 | 1 | 50 | 1 | 128 |

The network considered in the continuous formulation has significantly fewer weights compared to the discrete one. This is because the output of the network is predicted at a single point. Network architecture is shown in table 10. The input size of the network is equal to the sum of the size of the reduced state and the dimension of coordinates. For training we use 10 data nodes which is of total nodes . Similar to previous cases inner loop learning rate and physics loss penalty

are increased linearly with each epoch. Other training and testing hyperparameters are shown in table

11 and 12 respectively.Layer | Input | Output | Activation |

FC | 7 | 128 | Tanh |

FC | 128 | 128 | Tanh |

FC | 128 | 128 | Tanh |

FC | 128 | 128 | Tanh |

FC | 128 | linear | |

0.001 | 0.005 | 10 | 1201 | 8 | 0.01 | 10 | 10 | 40 | 6 |

0.1 | 50 | [6, 16] | 12 |

#### 4.2.2 Results and Discussions

Fig. 10 shows the prediction of Ensers with 16 sensors. A noticeable benefit of the continuous formulation is that prediction is smooth compared to discrete cases. Fig. 11 shows violin plot of L2 error defined in Eq. (19) between target and prediction with various noise levels in sensor measurement for the test case. In Eq. (19), and . Similar to discrete cases, the model is robust to noise in measurements. Fig. 11 shows a marginal increase in error with noise in the case of 16 sensors. Also, error bars with 6 sensors are similar to 16 sensors with low noise and increase slowly with noise.

### 4.3 Experiment: Convection-diffusion equations

Next we consider a 2-dimensional linear variable-coefficient convection-diffusion equation on ,

(35) |

(36) | |||

Convection-diffusion equations are classical PDEs that are used to describe physical phenomena where particles, energy, or other physical quantities are transferred inside a physical system due to two processes namely diffusion and convection. These equations are widely applied in many scientific areas and industrial fields, such as pollutants dispersion in rivers or atmosphere, solute transferring in a porous medium, and oil reservoir simulation. We consider variables convection and diffusion coefficients Eq. (36) in this experiment.

#### 4.3.1 Data-set and Model Parameters

Data is generated by solving the problem (35) using a high precision numerical scheme with a pseudo-spectral method for spatial discretization and 4th order Runge-Kutta for temporal discretization (with time step size ). We assume periodic boundary conditions and the initial value

(37) |

where , and k and l are chosen randomly. For mor