1 Introduction
The integration of deep learning has benefited modern algorithms in modeling, data processing, prediction, and control of various engineering systems. In fluid mechanics, work on machine learning implementation started last decade and has grown since then. Milano & Koumoutsakos
[1]used a neural network to reconstruct turbulence flow fields and the flow in the nearwall region of a channel flow using wall information. Ling et al.
[2] and Geneva & Zabaras[3], in two separate works, used deep learning algorithms to improve a Reynoldsaveraged Navier–Stokes turbulence model. Recently, Geneva & Zabaras[4] proposed a multifidelity deep learning framework for turbulent flow. Erichson et al.[5] used a shallow network for estimating 2D state from measurements over cylinder surface. In this paper, we are particularly interested in applying deep learning for state estimation in fluid mechanics, and hence, the discussion hereafter is focused on the same.State estimation is the ability to recover flow based on a few noisy measurements. It is an inverse problem and arises in many engineering applications such as remote sensing, medical imaging, ocean dynamics, reservoir modeling, and blood flow modeling. Controlling the flow and optimizing machine design in these applications depend upon the ability to predict the state with given sensors. The challenge associated with state estimation is twofold. Firstly, for almost all practical cases, stateestimation is an illposed problem, and hence, a unique solution to the problem does not exist [6]. Secondly, for practical problems, the number of sensors available is often limited. As a consequence, one has to deal with a sparse data set [7].
Attempts for state estimation dates back to 1960 when the Kalman filter based approaches were used for state estimation
[8]. This method assumes the system’s dynamics to produce full state and updates it based on new measurements to reduce estimation error forming a closed feedback loop. However, the classical Kalman filter based approaches are only applicable for linear dynamical systems [9]. Improvements to Kalman filter algorithms, such as Extended Kalman filter [10] and Unscented Kalman filter [11] algorithms can also be found in the literature. Nayek et al.[12] attempts to generalize Kalman filters by using the Gaussian process. Approaches based on observer dynamical system uses a reducedorder model to predict the future based on the past while simultaneously corrected by receiving measurements. Tu et al.[13] applies dynamic mode decomposition as a reducedorder model to Kalman smoother estimate to identify coherent structures. Buffoni et al.[14] used a nonlinear observerbased on Galerkin projection of NavierStokes equation to estimate POD coefficients. The use of Bayes filters, such as the Kalman and particle filters, in conjugation with POD based ROM on various flow problems, can also be found in the literature [15, 16, 17].Another major category of approaches includes librarybased approaches and stochastic approaches. Librarybased approaches use offline data, and the library consists of generic modes such as Fourier, wavelet, discrete cosine transform basis or data specific POD or DMD modes, or training data. Library based approaches using sparse representation assumes state can be expressed as the combination of library elements. Sparse coefficients are obtained by solving pursuit problem [18, 19]. Callaham et al.[20] used sparse representation and training data as the library with localized reconstruction for reconstructing complex fluid flows. Gappy POD [21] estimates POD coefficients in a leastsquare sense and uses a library of POD modes. However, it is prone to illconditioning and is dealt with using the best sensor placements [22] to improve the condition number[23].
The most explored approach for state estimation is perhaps the one based on stochastic estimation. The idea was first proposed by Adrian [24]
for a turbulence study where the conditional mean was approximated using a power series. In a linear setting, coefficients are computed using a twopoint, secondorder correlation tensor. Other variants like quadratic stochastic estimation
[25] and spectral linear stochastic estimation [26] can also be found in the literature. Guezennec [27] proposed to include time delayed measurements to further improve accuracy. . Bonnet et al.[28] extended stochastic approach to estimate POD coefficients. A linear mapping between sensors and coefficients was assumed. Recently, Nair & Goza[29] used a neural network to learn a nonlinear mapping between sensor measurements and POD coefficients. These approaches allow more flexibility in sensor placements and have been applied for flow control over airfoil [30] and analyzing isotropic turbulence[31, 32] and turbulent boundary layers[27, 25].One limitation associated with all the approaches mentioned above resides in the fact that spatial information of a single sample is used to recover the full state, but often, data is sequential. Ignoring the sequence of the data during state information invariably results in information loss. To address this apparent shortcoming, we propose a deep learningbased nonintrusive framework for state estimation that learns from sequential data. The proposed framework couples recurrent neural network (RNN) with autoencoder (AE). While AE is used to learn the nonlinear manifold, RNN is employed to take advantage of the time series data. We illustrate that by utilizing sequential data, the proposed framework is able to estimate the state in a more accurate fashion. Perhaps, more importantly, the number of sensors required is significantly less. For showcasing the performance of the proposed framework, two benchmark problems involving periodic vortex shedding and transient flow past a cylinder are considered. Results obtained are compared with those obtained using other stateoftheart techniques.
The remainder of the paper is organized as follows. In Section 2, details on the problem statement is provided. A brief review of RNN and AE are furnished in Section 3. Details on the proposed approach are provided in Section 4. Section 5 presents two numerical examples to illustrate the performance of the proposed approach. Finally, Section 6 provides the concluding remarks.
2 Problem statement
We consider a dynamical system obtained by partial discretization of the governing differential equations:
(1) 
where
represents the highdimensional state vector that depends on parameters
and time . in Eq. (1) is a nonlinear function that governs the dynamical evolution of the state vector . Note that for brevity of representation, we have not shown the dependence of the state on and , and the dependence of on and .We note that the state vector is highdimensional in nature and it is extremely difficult to directly work with . A commonly used strategy in this regards is to approximate the highdimensional state vector on a lowdimensional manifold,
(2) 
where represents the reduced space and is the manifold. in Eq. (2) represent the dimension of reduced space such that . Substituting Eq. (2) into Eq. (1), we obtain
(3) 
where and . In Eq. (3), we have assumed that is continuously differentiable such that ; . in Eq. (3) represents the initial condition. With this representation, the objective in stateestimation reduces to estimating the reduced order state variable . Generally, this is achieved by determining a mapping between the sensor measurements and reduced state,
(4) 
where represents the sensor measurements at timestep and indicates the number of sensors present. A schematic representation of the same is shown in Fig. 1.
The state estimation framework discussed above has two major limitations.

We note that the state estimation framework only relies on sensor responses at the current step for predicting the state variables. In other words, the sequential nature of the sensor measurements is ignored. This invariably results in loss of information, and hence, the accuracy of the state estimation is compromised.

Secondly, as shown in Eq. (2), the use of reducedorder model results in information loss. While completely avoiding this information loss is unavoidable, it is necessary to ensure that this information loss is minimized.
This paper aims to develop a deep learningbased framework for state estimation that addresses the two limitations discussed above.
3 A brief review of AE and RNN
This section briefly reviews two poplar deep learning approaches, namely autoencoders (AE) and recurrent neural networks (RNN). It is to be noted that AE and RNN form the backbone of the proposed approach.
3.1 Autoencoders
AE is a class of unsupervised deep learning techniques trained to copy the inputs to the output. It consists of a latent space/hidden layer that represents a compressed representation of the input; this is often referred to as the bottleneck layer. The network architecture for an AE can be viewed as having two parts, an encoder that maps the input to the latent space and a decoder that reconstructs the inputs from the latent space. Mathematically, this is represented as
(5) 
where represents the encoder network and represents the decoder network. in Eq. (5) represents the input variables with being the number of variables. A schematic representation of AE is shown in Fig. 2.
AE is integral to the neural network landscape and was initially developed for model reduction and feature extraction. As far as training an AE is concerned, an adaptive learning rate optimization algorithm(ADAM) is a popular choice. The learning in AE is generally expressed as
(6) 
where
represents the lossfunction and
are the hyperparameters (weights and biases) of the neural network.
corresponds to the hyperparameters of the encoder while corresponds to the hyperparameters of the decoder. Some important remarks on AE are furnished belowRemark 1: A situation where everywhere needs to be avoided [33]. In other words, the training algorithm is designed in such a way to restrict direct copying of the input.
Remark 2: In AE, the dimensionality of the latent space is generally much smaller than the dimensionality of the input variable , . Therefore, can be thought of as a reducedorder representation of .
Remark 3: When the decoder is linear, and
is a meansquared error, an AE learns to span the same subspace as principal component analysis.
Remark 4: AE with nonlinear encoder and nonlinear decoder learns a nonlinear reducedorder manifold; however, an AE with too much expressive capacity learns to copy the inputs (Remark 1).
It is to be noted that researchers are still working on developing AE for various types of tasks. Some of the popular AE available in the literature includes variational AE [34], sparse AE [35], stochastic AE [36] and capsule AE [37] among other. For further details on different types of AE, interested readers may refer [33].
3.2 Recurrent neural networks
Many of the learning tasks involved in artificial intelligence necessitate handling data that are sequential in nature. Examples of sequential data include image captioning, timeseries forecasting, and speech synthesis, among others. A recurrent neural network (RNN) is a type of neural network that is particularly suitable for sequential data. RNN captures the time dynamics by using cycles in the graph. Consider
to be inputs and to be the output at time . The output in RNN is expressed as a function of and ; however, owing to the cyclic graph in RNN, the hidden state is continuously updated as the sequence is processed. A schematic representation of a simple RNN is shown in Fig. 3. Note that different variants to the classical RNN can be found in the literature. In this work, we have used a Long ShortTerm Memory
[38], and hence, the discussion hereafter is focused on the same. Readers interested in other types of RNN may refer to [39].LSTM, first proposed by [38], is a type of gated RNN cell that overcomes the wellknow issue of vanishing gradient with the help of the gates that control the flow of information, i.e., differentiates between the information to be updated and that to be deleted [40, 41, 42]. LSTM cell comprises a forget gate, input gate, output gate, and a cell state. Each of these has its significance. The cell state refers to the information that has to be transferred in the sequence, and the respective gates determine the information that has to updated or deleted from the cell state [43]. The output of the current cell, also referred to as the hidden state, helps retain the shortterm memory, and the cell state, on the other hand, is used to retain the Longterm Memory. Cell state in LSTM is multiplied with the forget gate in each cell along with the addition from the input gate, and this provides the opportunity for forgetting gate to eradicate the unimportant information and input gate to enhance state with useful information [43]. A schematic representation of LSTM cell is shown in Fig. 4.
Mathematical, the operations being carried out inside a LSTM cell is represented using the following equations.
(7a)  
(7b)  
(7c)  
(7d) 
where , and are respectively the input gate, forget gate, and a candidate cell state. Note that the update is carried out is additive in nature; this allows longterm information to pass through and avoids the gradient from vanishing. The short term state in LSTM is calculated as
(8a)  
(8b) 
where represents the output gate. It is to be noted that is used as the output of the cell as well as the hidden state for the next timestep; this is responsible for the short term memory of LSTM. on the other hand is responsible for long term memory.
The use of RNN for complex dynamical systems has attracted significant interest from the research community; this is primarily because of its capability in capturing temporal dependencies [40]. Multiple architectures have been proposed for using RNN for accomplishing the task future state prediction of a dynamical system. Recently Geneva & Zabara[44] and Eivazi et al.[45] used LSTM and transformers as time integrator to predict flow evolution state. Hosseinyalamdary [46] used simple RNN for IMU modeling in deep Kalman filter. Otto & Rowley [47] used autoencoder with linear recurrence to learns important dynamical features. In this work also, we use LSTM for extracting useful information from sequential sensor data. The next section provides more details on the same.
4 Proposed approach
In this section, we propose a novel deep learning based framework for state estimation. The proposed approach integrates the two deep learning approaches discussed in Section 3, namely AE and RNN. Within the proposed framework, AE learns the reduced nonlinear subspace. It helps in reducing the information loss due to the compressed representation. RNN, on the other hand, extends the capability of the proposed approach and allows it to reuse sensor data collected at previous timesteps. AE (by reducing the state variable) and RNN (by incorporating information from the past) also helps address the illposedness associated with solving a state estimation problem.
Consider represents the measurement data obtained from sensors over timesteps. Also consider to be state variables. We can express the state variable at time as
(9) 
where represents a mapping between the sensor data and the reduced state variable and projects the reduced state variable back to the original space. and represent parameters associated with and , respectively. Unlike existing methods, sensor data corresponding to current and previous timesteps have been used for predicting in Eq. (9). A schematic representation of the same is shown in Fig. 5.
We also note that the sensor data is sequential and hypothesize that modeling this sequential nature of the sensor data will improve the predictive capability of model . Therefore, we propose to model by using RNN. As stated in Section 3.2, RNN is suited explicitly for modeling such sequential data. Another aspect in Eq. (9) is associated with the projection operator . We reiterate that accuracy and efficiency of Eq. (9) is significantly dependent on . One popular choice among researchers is to use proper orthogonal decomposition for computing the projection operator . However, proper orthogonal decomposition being a linear projection scheme has limited expressive capability. In this work, we propose to use AE as . Owing to the fact that AE is a nonlinear reduced order model, we expect the accuracy to enhance. However, one must note that training AE demands more computational effort as compared to proper orthogonal decomposition. Hereafter, we refer to the proposed framework as Autoencoder and Recurrent neural network based state Estimation (ARE) framework. Next, details on network architecture and training algorithm for ARE are furnished.
4.1 Network architecture and training
The ARE architecture proposed in this paper involves an AE and an RNN. For state estimation, the trained AE is split into the encoder and the decoder parts. The encoder part is used during training the RNN within the ARE framework. The decoder part is used while estimating the state variable. The AE and RNN networks are trained separately, with the latter following the former (see Fig. 6
). The AE architecture considered in this paper consists of 5 hidden layers, with the 3rd layer being the bottleneck layer. The flowfield is vectorized before providing it as an input to the AE. Rectified linear unit (ReLU) activation function has been used for all but the last layer. For the last layer, a linear activation function is used. For training the network, ADAM
[48] optimizer with a learning rate of is used. We denote the trained parameters of the AE as , where andcorresponds to the network parameters for encoder and decoder part respectively. For making the model robust to noisy data, two batchnorm layers and one dropout layer is added to AE. Batch normalization
[49] is used to normalizes the activation distribution. It reduces the model’s sensitivity to learning rate [50], reduces training time, and increases stability. Additionally, this also acts as a regularizer. Dropout [51]is also an effective regularization technique that works by dropping connections between neurons during training with a specified probability
. Using the dropout layer just before the second hidden layer proved to be most efficient in increasing the robustness of the network to noise as it simulates the noise in latent vector from the RNN network. A dropout probability of 0.35 is used in the network. Considering to be the th layer of AE, BN to be the batch normalization and DR to be the dropbout, the architecture used in this paper is as follows:Once the AE trains, we proceed to train the RNN part. The objective here is to learn a mapping between the sequential sensor data and the reduced state variable obtained by using the encoder part of the trained AE. First, the sensor data passes through RNN. It helps in capturing information from the sequential data. After that, the RNN outputs are mapped to the reduced states by using a feedforward neural network. Reduced states of the training outputs are obtained by using the trained AE (encoder part). The parameters
(see Eq. (9)) corresponds to the RNN and the feedforward neural network and are obtained by solving the following optimization problem(10) 
Where represents the combined RNN and feedforward neural network mapping. The second term in Eq. (10) represents regularization and is adopted to avoid overfitting. is a tuning parameter and needs to be tuned manually. Similar to AE, the optimization problem is solved by using ADAM optimizer [48]. We have used weight decay of and a learning rate of . Early stopping is used, which also acts as a regularizer [52]. RNN training is schematically shown in Fig. 6(b). The steps involved in training the proposed (ARE) are shown in Algorithm 1.
for RNN, learning rate parameters, network architectures and number of training epochs.
4.2 State estimation using the proposed approach
Once the proposed ARE is trained by following the procedure detailed in Algorithm 1, one can use it for estimating the state. A trained ARE performs state estimation in two simple steps. In the first step, the trained RNN is used for estimating the reduced state based on the sensor measurement. Once the reduced state has been estimated, the decoder part of the AE is used to project the reduced state onto the original state. For clarity of readers, the steps for predicting state using ARE are shown in Algorithm 2. A schematic representation of the same is also shown in Fig. 7.
5 Numerical experiments
In this section, two examples are presented to illustrate the performance of the proposed approach. The examples selected are wellknown benchmark problems in the fluid mechanic’s community. For both examples, we have considered that the sensor measures vorticity, and the objective is to reconstruct the vorticity field. We present case studies by varying the number of sensors and the number of sequences available. To illustrate the excellent performance of our approach, a comparison with another stateoftheart method has been provided. Comparison among results is carried out based on a qualitative and quantitative metric. To be specific, visual inspection is used as a qualitative metric, and the relative error is used as a quantitative metric,
(11) 
where represents the error, is the true state and is the state vector predicted using the proposed approach. represents the norm. The dataset for solving the state estimation problems is generated using OpenFoam [53]
. The proposed approach has been implemented using PyTorch
[54]. The software associated with the proposed approach, along with the implementation of both the examples, will be made available on acceptance of the paper.5.1 Periodic Vortex shedding past a cylinder
As the first example, we consider twodimensional flow past a circular cylinder at Reynolds’s number . It is a well known canonical problem and is characterized by periodic laminar flow vortex shedding.
A schematic representation of the computational domain is shown in Fig. 8. The circular cylinder is considered to have a diameter of unit. The center of the cylinder is located at a distance of units from the inlet. The outlet is located at a distance of units from the center of the cylinder. The sidewalls are at units distance from the center of the cylinder. At the inlet boundary, a uniform velocity of unit along the direction is applied. Pressure boundary condition with is considered at the outlet. A noslip boundary at the cylinder surface is considered.
The dataset necessary for training the proposed model is generated by using Unsteady Reynolds’s Average Navier Stokes (URANS) simulation in OpenFoam [53]. The overall problem domain is discretized into 63420 elements with finer mesh near the cylinder. Time step units is considered.
For training the model, a library of 180 snapshots is generated by running OpenFoam. Additional 120 snapshots, 60 for validation and 60 for testing, have also been generated. Two consecutive snapshots are separated by 10. Coordinate of the snapshot cutout stretches from which is discretized into points in and directions (see Fig. 8(b)). The objective here is to recover the complete vorticityfield in the cutout by using the sensor measurements. Details on the network architecture are provided in Table 1.
Network component  Architecture 

AE  
RNN  RNN 
To illustrate the superiority of the proposed approach, the results obtained using the proposed approach are compared with those obtained using proper orthogonal decompositionbased deep state estimation (PDS) proposed by [29]. In PDS, the first 25 modes are used, which is the same as the number of neurons in the bottleneck layer of the proposed approach. The feedforward neural network used in PDS for mapping the sensor measurements to the latent state is the same as the feedforward network used in ARE. Note that comparison with other popular approaches such as gappyPOD and linear stochastic estimation is not shown as it is already established in [29] that PDS outperforms both the approaches. Brief details on PDS, gappyPOD, and linear stochastic estimation are provided in Appendix A.
Fig. 9 shows the results obtained using different approaches. It is an idealized case where we have considered the sensor data to be noise free. The sequence length of four is used for training ARE. We have considered the extreme case where data from only one sensor is available. We observe that the proposed ARE, with only one senor, is able to recover the full state accurately. PDS, on the other hand, yields less accurate results.
Fig. 10 shows results corresponding to the case where the sensor is corrupted by white Gaussian noise. This is a more realistic case. Again, a sequence length of 4 is considered and it is assumed that data from only one sensor is available. In this case also, we observe that ARE yields highly accurate results and outperforms PDS.
The effect of noise on the proposed ARE is shown in Fig. 11. We observe that the proposed approach is robust to the noise in the sensor measurements.
Next, we investigate the effect of varying the number of sensors. Fig. 12 shows the performance of different methods with the increase in the number of sensors. Cases corresponding to one, two, five, and ten sensors are presented. We observe that as the number of sensors increases, PDS starts yielding better results. The proposed ARE found to yield the best result for all four cases.
We also carried out an additional case study where we considered that sensor data from both past and future is available. A bidirectional RNN (BRNN) based ARE developed for the same. However, due to the paucity of space, the same is not presented here. Those interested can refer to Appendix B for details on the same.
5.2 Transient Flow past a cylinder
As the second example, we consider the problem involving transient flow past a cylinder. Because of the transient nature of the flow, this is much more challenging than the periodic vortex shedding problem in Section 5.1. The problem domain, meshing, and solution strategy for this problem are considered the same as the periodic vortex shedding problem. However, unlike the previous problem, we have varied the training, and test data corresponds to different Reynolds numbers. It exponentially increases the complexity of the problem.
The training library for this problem was created by running URANS in OpenFoam. Total 1200 snapshots consisting of snapshots at were generated. Validation and test set consists of sequential snapshots at . Similar to the previous problem, the snapshots were separated by a time interval of . Coordinate of snapshot cutout stretches from (see Fig. 8(b)). The cutout is discretized into and points in the and direction, respectively. Similar to the previous example, the objective here is to recover the vorticity field based on sensor measurements. Similar to the previous example, results obtained have been compared with PDS (with a similar setup as before). Details on the network architecture used for this problem are provided in Table 2.
Network component  Architecture 

AE  
RNN  RNN 
Fig. 13 shows the reconstructed vorticity field using ARE and PDS. Ground truth has also been reported. It is an idealized case where we have considered the sensor data to be noisefree. A sequence length of four is used for ARE. We have considered the extreme case where data from only one sensor is available. For this problem also, the proposed is able to recover the full vorticity field accurately. PDS, on the other hand, fails to recover the vorticity field accurately.
Next, we consider a more realistic scenario where the sensor data is corrupted by noise. Fig. 14 shows the results corresponding to noisy sensor measurement. For this case also, the results obtained using the proposed ARE is found to be superior as compared to those in the literature.
The effect of noise on the performance of the proposed ARE is shown Fig. 15. ARE is found to be robust to the noise in the sensor measurements.
Next, we investigate the effect of a number of sensors and sequence length considered in the proposed approach. Fig. 16 shows the performance of different methods with an increase in the number of sensors. Cases corresponding to one, two, five, and ten sensors are presented. We observe that as the number of sensors increases, PDS starts yielding better results. The proposed approach is found to yield the best result for all four cases. Fig. 17 illustrates the performance of the proposed approach corresponding to the different sequence length. As expected, initially, the results are found to improve with an increase in the sequence length. ARE reaches a saturation point at sequence length three, and no significant improvement in results is observed on further increasing the sequence length.
Lastly, for details on BRNN based ARE for this problem, interested readers may refer Apendix B
6 Conclusions
In this work, we introduced a novel deep learning based approach for state estimation. The proposed approach uses an autoencoder as a reducedorder model and recurrent neural network to map the sensor measurements to the reduced state. The proposed framework is superior to existing state estimation frameworks in two aspects. First, autoencoder, being a nonlinear manifold learning framework, is superior to the usually used proper orthogonal decomposition. Secondly, unlike existing state estimation frameworks, the proposed approach utilizes present and past sensor measurements using the recurrent neural network. It results in improved accuracy, specifically for cases where limited sensors are deployed. Experiments performed on simulation of flow past a cylinder illustrated the capability of the proposed approach in learning from sequential data. Comparison carried out with respect to proper orthogonal decompositionbased deep state estimation showed the superior accuracy of the proposed approach. Moreover, The proposed approach was also found to be robust to the noise in the sensor measurements.
Utilizing sequential information can prove beneficial in many other state estimation tasks. Future work can be aimed at exploiting effects of other models used on sequential information such as transformers, comparing between other autoencoder network variations, the effect of varying time step between measurements, developing models capable of transfer learning, i.e., trained on smaller time step but can use larger time step. Work on utilizing, recently developed a physicsinformed neural network for solving such problems can also be pursued in the future.
Acknowledgements: The authors would like to thank Dr. Arghya Samanta and Nirmal J Nair for the useful discussions during this paper’s preparation. SC acknowledges the financial support of the IHub Foundation for Cobotics (IHFC) and seed grant provided through Faculty initiation grant, IIT Delhi.
References
 [1] Michele Milano and Petros Koumoutsakos. Neural network modeling for near wall turbulent flow. Journal of Computational Physics, 182(1):1 – 26, 2002.
 [2] Julia Ling, Andrew Kurzawski, and Jeremy Templeton. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics, 807:155–166, 2016.
 [3] Nicholas Geneva and Nicholas Zabaras. Quantifying model form uncertainty in reynoldsaveraged turbulence models with bayesian deep neural networks. Journal of Computational Physics, 383:125–147, 2019.
 [4] Nicholas Geneva and Nicholas Zabaras. Multifidelity generative deep learning turbulent flows. arXiv preprint arXiv:2006.04731, 2020.
 [5] N. Benjamin Erichson, Lionel Mathelin, Zhewei Yao, Steven L. Brunton, Michael W. Mahoney, and J. Nathan Kutz. Shallow learning for fluid flow reconstruction with limited sensors and limited data, 2019.
 [6] Lorenzo Rosasco, Andrea Caponnetto, Ernesto Vito, Francesca Odone, and Umberto Giovannini. Learning, regularization and illposed inverse problems. Advances in Neural Information Processing Systems, 17:1145–1152, 2004.
 [7] Rajdip Nayek, Suparno Mukhopadhyay, and Sriram Narasimhan. Mass normalized mode shape identification of bridge structures using a single actuatorsensor pair. Structural Control and Health Monitoring, 25(11):e2244, 2018.
 [8] Simo Särkkä. Bayesian filtering and smoothing. Number 3. Cambridge University Press, 2013.
 [9] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
 [10] Konrad Reif, Stefan Gunther, Engin Yaz, and Rolf Unbehauen. Stochastic stability of the discretetime extended kalman filter. IEEE Transactions on Automatic control, 44(4):714–728, 1999.
 [11] Eric A Wan, Rudolph Van Der Merwe, and Simon Haykin. The unscented kalman filter. Kalman filtering and neural networks, 5(2007):221–280, 2001.
 [12] Rajdip Nayek, Souvik Chakraborty, and Sriram Narasimhan. A gaussian process latent force model for joint inputstate estimation in linear structural systems. Mechanical Systems and Signal Processing, 128:497–530, 2019.
 [13] Jonathan Tu, John Griffin, Adam Hart, Clarence Rowley, Louis Cattafesta, and Lawrence Ukeiley. Integration of nontimeresolved piv and timeresolved velocity point sensors for dynamic estimation of velocity fields. Experiments in Fluids, 54, 01 2012.
 [14] M. Buffoni, S. Camarri, A. Iollo, E. Lombardi, and M.V. Salvetti. A nonlinear observer for unsteady threedimensional flows. Journal of Computational Physics, 227(4):2626 – 2643, 2008.

[15]
Ryota Kikuchi, Takashi Misaka, and Shigeru Obayashi.
Assessment of probability density function based on POD reducedorder model for ensemblebased data assimilation.
Fluid Dynamics Research, 47(5):051403, sep 2015.  [16] V. Mons, J.C. Chassaing, T. Gomez, and P. Sagaut. Reconstruction of unsteady viscous flows using data assimilation schemes. Journal of Computational Physics, 316:255 – 280, 2016.
 [17] Andre Fernando De Castro da Silva and Tim Colonius. Ensemblebased state estimator for aerodynamic flows. AIAA Journal, 56:1–11, 06 2018.
 [18] J. A. Tropp and A. C. Gilbert. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12):4655–4666, 2007.
 [19] S. G. Mallat and Zhifeng Zhang. Matching pursuits with timefrequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–3415, 1993.
 [20] Jared L. Callaham, Kazuki Maeda, and Steven L. Brunton. Robust flow reconstruction from limited measurements via sparse representation. Physical Review Fluids, 4(10), Oct 2019.
 [21] Richard Everson and Lawrence Sirovich. Karhunen–loève procedure for gappy data. JOSA A, 12, 08 1995.

[22]
Kelly Cohen, Stefan Siegel, and Thomas McLaughlin.
A heuristic approach to effective sensor placement for modeling of a cylinder wake.
Computers & Fluids, 35:103–120, 01 2006.  [23] K. Willcox. Unsteady flow sensing and estimation via the gappy proper orthogonal decomposition. Computers & Fluids, 35(2):208 – 226, 2006.
 [24] Ronald Adrian. On the role of conditional averages in turbulence theory. 1:323–332, 01 1977.
 [25] A. M. Naguib, C. E. Wark, and O. Juckenhöfel. Stochastic estimation and flow sources associated with surface pressure events in a turbulent boundary layer. Physics of Fluids, 13(9):2611–2626, 2001.
 [26] Dan Ewing and Joseph H. Citriniti. Examination of a lse/pod complementary technique using single and multitime information in the axisymmetric shear layer. In J. N. Sørensen, E. J. Hopfinger, and N. Aubry, editors, IUTAM Symposium on Simulation and Identification of Organized Structures in Flows, pages 375–384, Dordrecht, 1999. Springer Netherlands.
 [27] Y. G. Guezennec. Stochastic estimation of coherent structures in turbulent boundary layers. Physics of Fluids A: Fluid Dynamics, 1(6):1054–1060, 1989.
 [28] J. P. Bonnet, D. R. Cole, J. Delville, M. N. Glauser, and L. S. Ukeiley. Stochastic estimation and proper orthogonal decomposition: Complementary techniques for identifying structure. Experiments in Fluids, 17(5):307–314, Sep 1994.
 [29] Nirmal J. Nair and Andres Goza. Leveraging reducedorder models for state estimation using deep learning. Journal of Fluid Mechanics, 897:R1, 2020.
 [30] Jeremy Pinier, Julie Ausseur, Mark Glauser, and Hiroshi Higuchi. Proportional closedloop feedback control of flow separation. Aiaa Journal  AIAA J, 45:181–190, 01 2007.
 [31] Ronald Adrian. Conditional eddies in isotropic turbulence. Physics of Fluids, 22, 11 1979.
 [32] T. C. Tung and R. J. Adrian. Higher‐order estimates of conditional eddies in isotropic turbulence. The Physics of Fluids, 23(7):1469–1470, 1980.
 [33] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
 [34] Ilyes Khemakhem, Diederik Kingma, Ricardo Monti, and Aapo Hyvarinen. Variational autoencoders and nonlinear ica: A unifying framework. In International Conference on Artificial Intelligence and Statistics, pages 2207–2217. PMLR, 2020.
 [35] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323. JMLR Workshop and Conference Proceedings, 2011.
 [36] Hareesh Bahuleyan, Lili Mou, Hao Zhou, and Olga Vechtomova. Stochastic wasserstein autoencoder for probabilistic sentence generation. arXiv preprint arXiv:1806.08462, 2018.
 [37] Adam R Kosiorek, Sara Sabour, Yee Whye Teh, and Geoffrey E Hinton. Stacked capsule autoencoders. arXiv preprint arXiv:1906.06818, 2019.
 [38] Sepp Hochreiter and Jürgen Schmidhuber. Long shortterm memory. Neural computation, 9:1735–80, 12 1997.
 [39] Zachary C Lipton, John Berkowitz, and Charles Elkan. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019, 2015.
 [40] Suraj Pawar, Shady E. Ahmed, Omer San, Adil Rasheed, and Ionel M. Navon. Long shortterm memory embedded nudging schemes for nonlinear data assimilation of geophysical flows. Physics of Fluids, 32(7):076606, 2020.
 [41] José del Águila Ferrandis, Michael S. Triantafyllou, Chryssostomos Chryssostomidis, and George Em Karniadakis. Learning functionals via LSTM neural networks for predicting vessel dynamics in extreme sea states. CoRR, abs/1912.13382, 2019.

[42]
Sepp Hochreiter.
The vanishing gradient problem during learning recurrent neural nets and problem solutions.
International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, 06(02):107–116, 1998.  [43] Hamidreza Eivazi, Hadi Veisi, Mohammad Hossein Naderi, and Vahid Esfahanian. Deep neural networks for nonlinear model order reduction of unsteady flows. Physics of Fluids, 32(10):105104, 2020.
 [44] Nicholas Geneva and Nicholas Zabaras. Transformers for modeling physical systems, 2020.
 [45] Hamidreza Eivazi, Hadi Veisi, Mohammad Hossein Naderi, and Vahid Esfahanian. Deep neural networks for nonlinear model order reduction of unsteady flows. Physics of Fluids, 32(10):105104, Oct 2020.
 [46] Siavash Hosseinyalamdary. Deep kalman filter: Simultaneous multisensor integration and modelling; a gnss/imu case study. Sensors, 18(5), 2018.
 [47] Samuel E. Otto and Clarence W. Rowley. Linearlyrecurrent autoencoder networks for learning dynamics, 2019.
 [48] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014.
 [49] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.
 [50] Sanjeev Arora, Zhiyuan Li, and Kaifeng Lyu. Theoretical analysis of auto ratetuning by batch normalization, 2018.
 [51] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014.
 [52] Jeffrey Heaton. Ian goodfellow, yoshua bengio, and aaron courville: Deep learning: The mit press, 2016, 800 pp, isbn: 0262035618. Genetic Programming and Evolvable Machines, 19, 10 2017.

[53]
Hrvoje Jasak.
Openfoam: open source cfd in research and industry.
International Journal of Naval Architecture and Ocean Engineering, 1(2):89–94, 2009.  [54] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
 [55] M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681, 1997.
 [56] Pierre Baldi, Søren Brunak, Paolo Frasconi, Giovanni Soda, and Gianluca Pollastri. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics (Oxford, England), 15:937–46, 12 1999.
Appendix A Proper orthogonal decomposition based deep state estimation (PDS)
In this section, we briefly provide the theoretical background of PDS. We note that development of PDS is motivated from gappyPOD and linear stochastic estimation and hence, the discussion in this section also starts from gappyPOD and then proceeds to PDS via linear stochastic estimation.
Let training set consist of flattened data images, , where and corresponding sensor measurements . As already state in Section 2, the aim in state estimation is to recover full state via sensor measurements .
a.1 GappyPOD
Consider, is measurement operator that maps full state to measurements and have ones as sensor location and zeros otherwise. Mathematically,
(12) 
where can be approximated by linear combination of modes. Modes used here are most dominant modes, , and are obtained by proper orthogonal decomposition of training data and selecting first singular vectors,
(13) 
where
(14) 
and
(15) 
In Eq. (14), . During testing is obtained by solving the following minimization problem.
(16) 
Solution to this problem is obtained by taking MoorePenrose pseudo inverse
(17) 
This approach requires previous knowledge of operator . This operator is only available for simple systems but is often unknown for systems of practical interest.
a.2 Linear stochastic estimation
Linear Stochastic Estimation overcomes this issue by defining another operator which will map latent state to sensor. Operator is learned from training data via the following minimization problem.
(18) 
where and . This approach completely skips over the operator ; instead, is considered to be the empirical estimate of the linear operator . Subsequently, is obtained as
(19) 
a.3 Deep state estimation
Deep state estimation [29] approache replaces the linear mapping between latent state and sensors with nonlinear mapping to further generalize the approach. It uses a neural network parametrized by for nonlinear mapping of sensor measurements to approximate embeddings.
(20) 
The neural network is trained as
(21) 
where
(22) 
as before are the PODmodes. During testing trained neural network approximate embeddings for sensor measurements which is used to recover full state. We refer to this method as POD based deep state estimation (PDS).
Appendix B Bidirectional RNN
The basic idea of bidirectional recurrent neural nets (BRNN) [55, 56] is to present each training sequence to two separate recurrent nets namely forward and backward, both of which are connected to the same output layer. (In some cases a third network is used in place of the output layer, but here we have used the simpler model). This means that for every point in a given sequence, the BRNN has complete, sequential information about all points before and after it. In this work, we have used BRNN for solving a special state estimation problem where sensor measurements from both past and future states are available.
A Schematic representation of BRNN used is shown in the Fig. 18. It is modified to stop propagating information after the middle time step in forward and backward network. This lowers computational time and reduce number of training weights. Thus architecture proposed is capable of using information from both ahead and behind in time.
To feed sensor measurements into network, data is splitted into and as shown in the Fig. 18
. Note that although more accurate, this network can use only be used when delay in time of reconstruction is acceptable also it uses odd sequence length of sensor measurements. Output of forward and backward network is concatenated and passed to feedforward network which yields the final estimated latent representation of middle sensor measurement
. Thus training of can be formulated as follows(23) 
where are parameters of combined RNN and feedforward network .
b.1 Results
The state estimation results obtained using the BRNN based ARE is shown in Figs. 19 (periodic vortex shedding) and 20 (transient flow). For both the examples, results obtained using one and two sensors are presented. Compared to PDS, the states obtained using ARE are found to be superior. This is expected as unidirectional RNN was already providing superior results as compared to PDS.
Fig. 21 shows a comparative assessment between BRNN based ARE, and RNN based ARE. Results corresponding to one, two, five, and ten sensors are presented. For one sensor, results obtained using BRNN based ARE significantly outperforms those obtained using RNN based ARE. However, as the number of sensors increases, the results obtained using the two approaches becomes identical. One counterintuitive result is obtained for the periodic vortex shedding problem where the result obtained using BRNN based ARE found to be worse than that obtained using RNN based ARE. It is probably because the neural network parameters for BRNN has converged to a local minimum.