Introduction
The computational challenges faced during highfidelity
numerical simulations of engineering systems governed by nonlinear partial differential equations (PDEs), especially in applications involving control
[PBK2016], optimal design and multifidelity optimization [PWG2016], can often be mitigated by the development of reduced order models (ROMs)[Benner_Gugercin_etal_15].Proper orthogonal decomposition (POD) [Sirovich_87, BHL1993] is a well known method for extracting a solutiondependent reduced basis space from a set of wellresolved, highfidelity snapshots, and is most effective when the coherent structures of the dataset can be ranked in terms of their energy content. The POD method has been successfully applied in statistics [J1986]
, signal analysis and pattern recognition
[DM2008], ocean models [VH2006], air pollution models [FZPPBN2014], convective Boussinesq flows [SB2015], and Shallow Water Equation (SWE) models [SSN2014, LFKG2016]. Alternatively, nonlinear dimension reduction techniques such as kernel POD [Salvador2021]or deep learningbased approaches like autoencoders
[Lusch2018, Ghorbanidehno2021]have also been used for extracting a reduced basis. Combining autoencodergenerated bases with various specialized machine learning algorithms for time series modeling result in fully nonintrusive reduced order models
[Gonzalez_Balajewicz_2018, Eivazi2020, Maulik2021]. Hybrid methods [Lee2020, Kim2020] can also be obtained by combining a nonlinear manifold learning technique like autoencoder for discovering the latent space with an intrusive method for the temporal dynamics.Following the identification of the latent space, a reduced representation of the dynamical system is obtained by a Galerkin or PetrovGalerkin projection on to the latent space [LFKG2016, LFK2017], which typically involves intrusive
modifications of the highfidelity system operators. This work focuses on nonintrusive reduced order models (NIROMs) that do not require any knowledge of the highfidelity simulator. In such a framework, the evolution of the expansion coefficients in the latent space is usually computed by the application of different regressionbased methods directly on the highfidelity data. These include artificial neural networks (ANNs), in particular multilayer perceptrons
[HU2018], Gaussian process regression (GPR) [GH2019], and radial basis function (RBF) [ADN2013] interpolation. RBF interpolation in particular has been shown to be quite successful for nonlinear, timedependent partial differential equations (PDEs) [DFPSP2021], nonlinear, parametrized PDEs [ADN2013], and aerodynamic shape optimization [IQ2013].Alternatively, in deep neural networks (DNN) such as ResNet, the evolution of features over the depth of the network is equivalent to solving an ordinary differential equation (ODE) of the form with the forward Euler method [Ruthotto2019]. With this connection in mind, [Chen2018] proposed a ’continuousdepth’ neural network called ODENet that effectively replaced the layers in ResNetlike architectures with a trainable ODE solver. This neural ordinary differential equation approach (NODE) was further improved in [Gholami2019, Dupont2019] and [Finlay2020] proposed a NODE generative model that can be efficiently trained on largescale datasets. Some applications of the NODE framework include latent space closure modeling [Maulik2020], ODE/PDE model identification [Sun2020], modeling of irregularly spaced time series data [RCD2019], and modeling of spatiotemporal information in video signals [KVKP2019].
Dynamic mode decomposition (DMD) is yet another method for obtaining a reduced order model. DMD represents the temporal dynamics of a complex, nonlinear system [Schmid2010, KBBP2016]
as the combination of a few linearly evolving, spatially coherent modes that oscillate at a fixed frequency, and which are closely related to the eigenvectors of the infinitedimensional Koopman operator
[K1931]. Several variants of the DMD algorithm have been proposed [PBK2016, KFB2016, ABBN2016, LV2017] and have been successfully applied as efficient ROM techniques for determining the optimal global basis modes for nonlinear, timedependent problems [DMD, Bistrian2017]. For nonparametrized PDEs, DMD presents an efficient framework that combines all the three stages of ROM development to learn a linear operator in an optimal least square sense. However, this approach cannot be directly applied to parametrized problems [Alsayyari2021].In [Dutta2021AAAI], we explored propagating the dynamics of a latent space formed from POD modes with neural ODEs. The present work investigates substituting the latentspace described by POD modes with one learned by an autoencoder. Our results for the combined autoencoderNODE approach are compared to other methods like  a) dimension reduction by POD modes with latent space temporal dynamics captured by Neural ODEs (PODNODE), b) dimension reduction via POD and temporal evolution of the latent space with Radial Basis Functions (PODRBF), and c) Dynamic Mode Decomposition (DMD), which serve as benchmarks in our numerical experiments. The performance of each approach will be evaluated on sample problems based on incompressible flow around a cylinder and shallow water hydrodynamics in the context of fast replay applications for complex fluiddynamics problems.
Methodology
The standard ROM development framework can be divided into three stages:

identification of a lowdimensional latent (or reducedorder) space,

representation of the nonlinear dynamical system in terms of the reduced basis and modeling the evolution of the system of modal coefficients, and

reconstruction in the highfidelity space for validation and analysis.
Dimension reduction
In this work, the dominant features of the unsteady flowfield have been extracted using a linear modal decomposition technique, POD, and a nonlinear manifold learning method that relies on deep, fullyconnected, multilayer perceptron (MLP) autoencoders that are highly expressive and scalable [HR2006]. POD is a popular technique for dimension reduction of the solution manifold of a dynamical system by determining a linear reduced space spanned by an orthogonal basis with an associated energetic hierarchy, and which represents an optimal approximation of the solution manifold with respect to the norm. Given a matrix of highfidelity system snapshots
and a matrix of orthonormal POD basis vectors
, the modal coefficient matrix constitutes our training data for the latent space learning methods. [Taira2020] provides an excellent overview of POD as well as a comparison with other dimensionreduction techniques.Autoencoders
An autoencoder is a type of feedforward neural network that is designed to learn the identity mapping, such that and . This is accomplished using a twopart architecture. The first part is called an encoder, , defined by where (), which maps a highdimensional input vector to a lowdimensional latent vector .
The second part is called a decoder, , defined as , which maps the latent vector to an approximation of the highdimensional input vector . The combination of the two parts yields an autoencoder of the form
(1) 
This autoencoder is trained by computing the optimal values of the parameters that minimize the reconstruction error over all the training data
(2) 
where is a chosen measure of discrepancy between and its approximation . The restriction forces the autoencoder model to learn the salient features of the input data via compression into a lowdimensional space and then reconstructing the input, instead of directly learning the identity function. It is worth noting that with the choice of a linear, singlelayer encoder of the form , and a linear, singlelayer decoder of the form , where ,
, and a squared reconstruction error as the loss function
, the autoencoder model has been shown to learn the same subspace as that spanned by the first POD modes if . However, additional constraints are necessary to ensure that the columns of form an orthonormal basis and follow an energybased hierarchical ordering [P2018].In this work, autoencoders are employed to generate separate lowdimensional latent representations of the pressure (depth) and the velocity snapshot data obtained from the highfidelity simulation of the numerical examples. The encoder and decoder neural networks are constructed using fullyconnected MLP architectures, as depicted in Figure 1. As the highfidelity simulation data is usually available on a twodimensional spatial grid, the data is first flattened and then fed to the autoencoder model.
Latent space evolution
In this section, we outline the nonintrusive framework for modeling the evolution of timeseries data in the latent space. The first method employs RBF interpolation which is a classical, datadriven, kernelbased method for computing an approximate continuous response surface that aligns with the given multivariate data. More details about the PODRBF NIROM framework can be found in [DFPSP2021]. The second technique called NODE is a neuralnetwork based method to predict the continuous evolution of a vector over time, that is designed to preserve memory effects within the architecture.
Neural ordinary differential equations
We assume that the time evolution of the modal coefficients of the highfidelity dynamical system in the latent space can be modeled using a (firstorder) ODE,
(3) 
The goal is to obtain a NN approximation of the dynamics function such that . The full procedure can be outlined as follows:

Compute the time series of modal coefficients for where .

Initialize a NN approximation for the dynamics function where represents the initial NN parameters.

The NN parameters are optimized iteratively through the following steps.

Compute the approximate forward time trajectory of the modal coefficients by solving eq. (3) using a standard ODE integrator as,
(4) 
The free parameters of the NODE model are . Evaluate the differentiable loss function .

To optimise the loss, compute gradients with respect to the free parameters. Similar to the usual backpropagation algorithm, this can be achieved by first computing the gradient
, and then a performing a reverse traversal through the intermediate states of the ODE integrator. For a memoryefficient implementation, the adjoint method [Chen2018] can be used to backpropagate the errors by solving an adjoint system for the augmented state vector backwards in time from to . 
The gradient computed in the previous step is used to update the parameters
by using an optimization algorithm like RMSProp or Adam.


The trained NODE approximation of the dynamics function can be used to compute predictions for the time trajectory of the modal coefficients.
Following [Dutta2021AAAI], we adopt the TFDiffEq (https://github.com/titu1994/tfdiffeq
) library that runs on the Tensorflow Eager Execution platform to train the NODE models. RMSProp is adopted for loss minimization with an initial learning rate of
, a staircase decay function with a range of variable decay schedules, and a momentum coefficient of .As a final point of comparison, we consider the standard Dynamic mode decomposition (DMD)[Schmid2010, KBBP2016] algorithm, which is a powerful datadriven method capable of providing an accurate decomposition of a complex system into spatiotemporal coherent structures that may be used for shorttime futurestate prediction.
Numerical Experiments
In this section, we first assess the use of autoencoders for building a reduced space in which the system dynamics are propagated by NODE for a benchmark flow problem characterized by the incompressible Navier Stokes equations (NSE). We compare the performance of this framework with a PODNODE method where NODE is employed to capture the evolution of modal coefficients in a reduced space defined by a POD basis. Moreover, we also examine the relative performance of different NIROM models for a realworld estuarine flow application governed by the shallow water equations (SWE). The PODRBF and DMD NIROM training runs were performed on a Macbook Pro 2018 with a GHz 6Core Intel Core i9 processor and 32 GB 2400 MHz DDR4 RAM. The autoencoder latentspace representations were trained on Vulcanite, a high performance computer at the U.S. Army Engineer Research and Development Center DoD Supercomputing Resource Center (ERDCDSRC). Vulcanite is equipped with NVIDIA Tesla V100 PCIe GPU accelerator nodes and has 32GB memory/node. Training for the NODE models was performed in serial on Jim, a high performance computer at the U.S. Army Engineer Research and Development Center Coastal and Hydraulics Lab (CHL), which is equipped with 2 Intel Xeon E52699 v3 CPUs and 128Gbytes of memory/node.
Flow around a cylinder
The problem of twodimensional, incompressible flow around a cylinder is a classical benchmark CFD example that simulates a timeperiodic fluid flow through a 2D pipe with a circular obstacle. For further details about the problem setup please see [Dutta2021AAAI]. Highfidelity simulation data is obtained with OpenFOAM using an unstructured mesh with nodes at , such that the flow exhibits the periodic shedding of von Karman vortices. training snapshots are collected for seconds with seconds, and the NIROM predictions are obtained for seconds with seconds.
Id  Units  Encoder  Decoder  Scaling  MSE  Training 

Range  () 28  linear, relu, … 
elu, tanh, …  
AE1  linear  sigmoid  2.398e6  9.38 min  
AE2  linear  sigmoid  4.196e6  9.56 min  
AE3  linear  tanh  2.499e6  9.29 min  
AE4  linear  sigmoid  3.107e6  9.07 min 
Several different architectures were explored for the autoencoder model by varying the model parameters like the dimension of the latent space, the activation functions, various configurations of the learning rate evolution during training, and Table
1 shows four of the best architectures for the cylinder flow example. The second column shows the size of the latent space for each of the solution variables . The third and fourth columns list the activation functions used in the last hidden encoder and the last hidden decoder layers, respectively.The fifth column shows the scaling applied to the input data which is directly determined by the activation function used in the decoder output layer. In general, it was found that activation functions that required scaled input data like sigmoid, tanh performed better for the decoder output layer than some of the (semi)unbounded ones like linear, relu, and elu. However, unbounded activation functions like linear and elu were seen to generate a more accurate and efficient latent space. The encoder and the decoder networks were made up of four hidden layers with gradually decreasing and gradually increasing size, respectively, and characterized by the relu activation function, which were found to generate the least noisy latent representations. The Adam optimizer is used for training with an initial learning rate= and momentum=. An adaptive learning rate decay algorithm is employed that monitors the training loss and reduces the learning rate by a factor of 2 if no improvement is detected for epochs. The sixth column of Table 1 lists the total mean square reconstruction error for all three solution variables, while the last column shows the training times for each model on two NVIDIA Tesla V100 GPU nodes.
Figure 2 shows the temporal coefficients for the first two latent modes of using a POD truncation that captures of modal energy content, and the four chosen AE models. While the POD modal coefficients are arranged according to the decreasing order of amplitude, this cannot be guaranteed for the AEgenerated spaces. AE2 and AE3 models define a latent space of dimension 2, whereas the AE4 model has 3 latent space units and the AE1 model is identical to the dimension of the POD bases for each variable  5 (), 8 (), and 7(). The richer quality of the AE1 latent space leads to better expressivity which is reflected in the subtler features captured in the modal coefficients (Figure 2 row 2) and also in the lowest reconstruction error (Table 1).
Figure 3 shows the evolution of the training loss and the adaptive decay of the learning rate while training autoencoder models for using four gradually increasing latent space dimensions 
. The AE3 architecture was adopted for these runs. The optimization of the hyperparameters with respect to the training loss becomes gradually harder as the dimension of the latent space increases. So the models with smaller latent spaces initially show a faster reduction in training losses. However, the enhanced expressivity of the models with higher latent dimensions allow them to reach lower values of training loss after sufficient epochs of training, whereas the losses for the models with smaller latent dimensions appear to stagnate. Figure
4 shows the training loss and the learning rate decay for the four chosen AE models. The AE1 and the AE4 models with the relatively richer latent spaces achieve the sharpest reduction in training loss, whereas the latent space dimensions for models AE2 and AE3 appear to be significant barriers in their training efficiency.Id  Lyrs  Units  Act.  LR steps, rate  Scaling  Aug.  MSE  Training 

Range  14  32512  tanh, elu,…  5k25k, 0.10.9  
AE1NODE1  1  256  elu  10k, 0.3  No  No  2.30e5  28.80 hrs 
AE1NODE2  1  256  tanh  5k, 0.7  Yes  No  1.34e4  28.69 hrs 
AE1NODE3  1  512  elu  5k, 0.5  No  No  1.97e5  29.17 hrs 
AE1NODE4  1  256  tanh  10k, 0.25  Yes  Yes  1.49e4  28.27 hrs 
AE1NODE5  4  64  tanh  5k, 0.5  Yes  No  1.33e4  33.08 hrs 
An extensive exploration of the NODE hyperparameter and architecture space for the cylinder example was reported in [Dutta2021AAAI]. Based on those inferences and some further numerical study, five architectures were selected, and the results after training for 50000 epochs using the latent space data generated by the AE1 model are presented in Table 2. All the models generate accurate predictions at a finer temporal resolution than the training data, and have excellent agreement with the highfidelity solution even while extrapolating outside the training data ( seconds).
The first row of Fig. 5 compares the time trajectory of the spatial root mean square errors (RMSE) in the highfidelity space for NIROM solutions generated using the POD and the AE1 basis with the NODE3 and NODE5 configurations. It is encouraging to note that even though the latentspace sizes are identical between the POD and the AE1 basis, the AE1NODE solutions are more accurate, confirming that the nonlinear AE basis generates a more accurate spatial compression than a linear POD basis of similar size. The second row of Fig. 5 compares the RMSE between the AE4NODE3, PODNODE3, and two DMD NIROM models with truncation levels of and . The AE4NODE3 model achieves better accuracy than all the other models even while using the most compressed latent space (:2,3,3). The DMD(8) model is roughly comparable with the PODNODE3 model owing to the similarity in their latent space dimensions, whereas the DMD(3) model fares the worst due to the lack of hierarchical ordering in its latent basis.
Shallow water equations
The final numerical example involves flows governed by the depthaveraged SWE which is written in a conservative residual formulation as
(5) 
Here, the state variable consists of the flow depth, , and the discharges in the and directions, given by and , respectively. Further details about the flux vectors , and the highfidelity model equations are available in [DFPSP2021]. The highfidelity numerical solutions of the SWE are obtained using the 2D depthaveraged module of the Adaptive Hydraulics (AdH) finite element suite, which is a U.S. Army Corps of Engineers (USACE) highfidelity, finite element resource for 2D and 3D dynamics [Trahan2018].
Tidal flow in San Diego bay
This numerical example involves the simulation of tidedriven flow in the San Diego Bay in California, USA.
The AdH highfidelity model consists of nodes, uses tidal data obtained from NOAA/NOS CoOps website at a tailwater elevation inflow boundary and has no flow boundary conditions everywhere else. Further details are available in [DFPSP2021].
The training space is generated using highfidelity snapshots obtained between minutes to hours at a time interval of seconds. The predicted ROM solutions are computed for the same time interval with seconds. A latent space of dimension is generated by using a POD truncation tolerance of for each solution component. The AE2 architecture designed for the cylinder example is modified by including BatchNormalization layers for each hidden layer, using full batches for training, and by enforcing a latent space of dimension for each solution component. The RBF NIROM approximation is computed using a shape factor, . The simulation time points provided as input to the NODE model are normalized to lie in . The ‘dopri5’ ODE solver is adopted for computing the hidden states both forward and backward in time. Learning from the conclusions of the cylinder example, a network consisting of a single hidden layer with neurons is deployed and the RMSProp optimizer with an initial learning rate of , a staircase decay rate of every epochs, and a momentum of is utilized for training the model over epochs. For the DMD NIROM, the simulation time points are normalized to an unit time step, and a truncation level of is used to compute the DMD eigenspectrum.
Figure 10 compares the PODNODE and the AENODE NIROM solution fields (top row) for at hours and the corresponding error plots (bottom row). It can be seen that even while using a latent dimension that is three times smaller than POD, the relative errors for the AENODE solution () is almost two times lower than that of the PODNODE solution (). Figure 11 shows the spatial RMSE over time for the depth (left) and the xvelocity (right) NIROM solutions. The AENODE ( modes) solution has comparable accuracy to the PODNODE ( modes) and the DMD ( modes) solution for the depth variable and actually outperforms the PODNODE ( modes) solution for the xvelocity variable. Additionally, unlike the RBF NIROM solution, the AENODE solution does not exhibit any appreciable accumulation of error over time due to the higherorder timestepping scheme adopted for NODE.
Conclusion
We have studied the combined use of autoencoders as a datadriven method for identifying an efficient reduced latent space that approximates the solution manifold of a system of nonlinear, timedependent PDEs and neural ODEs as a nonintrusive method for modeling the evolution of the reduced latent space coefficients for the aforementioned dynamical system. Numerical experiments were carried out with a benchmark flow problem governed by the incompressible Navier Stokes equations and a realworld application of estuarine flow dynamics governed by the twodimensional shallow water equations. The AE latent space representation was shown to provide a very high degree of efficient spatial compression, especially for the advectiondominated shallow water example. The PODNODE NIROM formulation demonstrated a stable and accurate learning trajectory in modeling reduced basis dynamics, even in comparison to classical ROM techniques utilizing DMD, POD and RBF interpolation. The AENODE formulation also produced encouraging extrapolatory predictions for the flow around a cylinder example. This presents an exciting prospect for future exploration as even for an isolated system, unperturbed by unseen external forcings, truly extrapolative predictions of reduced order dynamics in flow regimes that do not correspond to the training data is a rare feature for most wellestablished ROM frameworks.
This study leads to several promising avenues of research. For instance, neural architecture search (NAS) tools can be adopted to perform an exhaustive exploration of the network architecture and model hyperparameter space for a wide range of flow dynamics in order to gain insight of the learning trajectory and to design more generalizable AE models with improved reconstruction accuracy. Moreover, integrating the process of identification of an AEbased latent space with the modeling of system dynamics using NODE may lead to significant performance improvements if the two independent learning problems can be designed to intelligently inform each other. All the relevant data and codes for this study will be made available in a public repository at https://github.com/erdc/aenode_nirom upon publication.
Acknowledgments
This research was supported in part by an appointment of the first author to the Postgraduate Research Participation Program at the U.S. Army Engineer Research and Development Center, Coastal and Hydraulics Laboratory (ERDCCHL) administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and ERDC. Permission was granted by the Chief of Engineers to publish this information.
Comments
There are no comments yet.