Latent-space time evolution of non-intrusive reduced-order models using Gaussian process emulation

07/23/2020 ∙ by Romit Maulik, et al. ∙ The Alan Turing Institute Argonne National Laboratory Imperial College London 0

Non-intrusive reduced-order models (ROMs) have recently generated considerable interest for constructing computationally efficient counterparts of nonlinear dynamical systems emerging from various domain sciences. They provide a low-dimensional emulation framework for systems that may be intrinsically high-dimensional. This is accomplished by utilizing a construction algorithm that is purely data-driven. It is no surprise, therefore, that the algorithmic advances of machine learning have led to non-intrusive ROMs with greater accuracy and computational gains. However, in bypassing the utilization of an equation-based evolution, it is often seen that the interpretability of the ROM framework suffers. This becomes more problematic when black-box deep learning methods are used which are notorious for lacking robustness outside the physical regime of the observed data. In this article, we propose the use of a novel latent space interpolation algorithm based on Gaussian process regression. Notably, this reduced-order evolution of the system is parameterized by control parameters to allow for interpolation in space. The use of this procedure also allows for a continuous interpretation of time which allows for temporal interpolation. The latter aspect provides information, with quantified uncertainty, about full-state evolution at a finer resolution than that utilized for training the ROMs. We assess the viability of this algorithm for an advection-dominated system given by the inviscid shallow water equations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 7

page 10

page 11

page 12

page 15

page 16

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently, researchers have shown sustained interest in using machine learning (ML) methods for non-intrusive reduced-order models (ROMs) for systems that may be governed by advection-dominated partial differential equations (PDEs). This is because solving PDE forward-models for such systems may require very fine spatiotemporal numerical discretizations which cause a significant bottleneck in design and forecast tasks

[44]. The prospect of bypassing traditional numerical methods and building surrogates from data alone [26, 37, 36, 41] is attractive for multiple applications ranging from engineering design [4, 38] and control [35, 22] to climate modeling [8, 9, 28]. This is because data-driven ROMs allow for rapid predictions of nonlinear dynamics unencumbered by the stability-based limitations of numerical discretizations. In almost all ROM applications, forecasts must be conditioned on time and several control parameters, such as the initial conditions or the physical properties of the governing laws. In addition, since these models eschew the use of PDEs, it is necessary to associate some notion of uncertainty quantification during the time evolution of these surrogate dynamical systems. This is to ensure that the loss of interpretability and reliability by bypassing equations is offset by a feedback process from the ML.

Neural networks have been used for ROMs for decades. One of the earliest examples [14]

used a simple fully connected network for forecasting meteorological information. More recently, researchers have incorporated a single-layered feed-forward neural network into a nonlinear dynamical system and built a surrogate model for a high-dimensional aerodynamics problem

[27]

; radial basis function networks have been used to make forecasts for a nonlinear unsteady aerodynamics task

[51, 21]; and a simple fully connected network has been used for learning the dynamics of an advection-dominated system [40, 13].

Neural networks are commonly used for two tasks in typical ROM construction: Compression and time-evolution. For the former, they may be used as a nonlinear equivalent of the proper orthogonal decomposition (POD) or principal component analysis (PCA) based methods which find a linear affine subspace for the full system. The identification of this reduced basis to ensure a compressed representation that is minimally

lossy is a core component of most ROM development strategies (some examples include [39, 19, 16]

). While POD-based methods currently represent the most popular technique for reduced-basis (or latent space) construction, data generated from PDE simulations can often be interpreted as images on a square grid; therefore, convolutional neural networks have also been applied

[43, 17, 12, 30] for building computationally effective ROMs.

Once this basis (or latent representation) is identified, we need a cost-effective strategy for accurate nonlinear dynamical system evolution to reproduce the full-order spatiotemporal complexity of the problem in the reduced basis. For example, linear reduced-basis construction allows for the use of intrusive methods (which project the governing equations onto the reduced-basis), as seen in [15, 34], which use a Galerkin projection; or [6, 49, 11], which use the Petrov–Galerkin method (see [5]

for the comparison of these two methods). Such extensions are not straightforward for autoencoder based latent space constructions, since projecting to a basis space is infeasible. We note, though, that the use of convolutional autoencoder architectures has been demonstrated for the Petrov–Galerkin method, where there is no requirement to project the governing equations onto a trial space

[24]

. As mentioned previously, neural networks are also commonly used for the temporal evolution of these latent space representations. Mainstream ML has generated a large body of time-series forecasting literature that lends itself readily to latent-space evolution of dynamical systems. Recently, long short-term memory architectures (LSTMs) have become popular for the non-intrusive characterization of dynamical systems

[45, 1, 32, 33, 46] as they allow for the construction of non-i.i.d, directional relationships between the states of a dynamical system at different times before a future forecast is performed. Other methods that have been utilized for such tasks include the temporal convolutional network [50], non-autoregressive variants for the LSTM [29] and system-identification within latent space [7]. In most of these developments, the evolution framework (in time) is deterministic in nature. These frameworks highlight a crucial drawback when coupled with the black-box nature of purely data-driven ROMs – it is imperative to embed some notion of uncertainty into the time evolution of ROMs in the latent space. Past work has shown that LSTMs suffer from stability and interpretability issues when utilized for the reduced-order modeling of advection-dominated systems [29] and this study proposes an alternative to provide interpretable forecasts with quantified uncertainty. To that end, we propose the use of a Gaussian process regression (GPR) framework for evolving the low-dimensional representation of data obtained from a PDE evolution. In addition to providing forecasts at the temporal resolution of the training data, the use of the our framework allows for interpolation in time. This is because time is interpreted as a continuous variable. When coupled with quantified uncertainty, the framework allows a ROM user to interrogate the behavior of the emulated system at a finer temporal resolution, with implications for data sources that are temporally sparse.

To summarize, the major contributions of this article are:

  • We introduce a technique to construct parametric non-intrusive ROMs with quantified uncertainty during latent-space time evolution of dynamical systems.

  • We detail the use of Gaussian processes (GPs) that are customized for the time-evolution of parametric nonlinear dynamical systems.

  • We demonstrate the ability of the proposed time-evolution algorithm for systems compressed by linear reduced-basis methods such as POD, as well as nonlinear compression frameworks such as variational and convolutional autoencoders.

  • We test the ability of a ROM constructed from coarse training data to interpolate on a finer temporal resolution with quantified uncertainty.

In the following, we shall introduce our experimental setup which relies on the inviscid shallow water equations in Section 2, our various compression mechanisms in Section 3, the UQ embedded emulation strategy in Section 4, followed by results and conclusions in Sections 5 and 6 respectively.

Figure 1: Schematic of reduced-order modeling. Upper panel shows the dimensionality reduction operation to a latent space. Lower panel shows the forecasting in the latent space for reconstructions.

2 The shallow water equations

The inviscid shallow water equations belong to a prototypical system of equations for geophysical flows. In particular, the shallow water equations admit solutions where advection dominates dissipation and pose challenges for conventional ROMs [47]. These governing equations are given by

(1)
(2)
(3)

where, corresponds to the total fluid column height, and is the fluid’s horizontal flow velocity, averaged across the vertical column, is acceleration due to gravity, and is the fluid density, typically set to . Here, , and are the independent variables of time and the spatial coordinates of the two-dimensional system. Equation 1 captures the law of mass conservation, whereas Equations 2 and 3 denote the conservation of momentum. The initial conditions of the problem are given by

(4)
(5)
(6)

i.e., a Gaussian perturbation at a particular location on the grid . We solve the system of equations until with a time-step of seconds on a square two-dimensional grid with collocation points to completely capture the advection and gradual decay of this perturbation. Note that these numbers may vary according to the forecasting and fidelity requirements of a particular problem and perturbation. The initial and boundary conditions for this particular shallow-water equation experiment represent a tightly-controlled traveling wave problem that is translation invariant. Different realizations of the initial condition lead to translationally shifted trajectories. We also note the presence of mirror symmetries with respect to and coupled with a rotational symmetry of radians about the origin. However, our motivation for a first assessment of our emulators on this system stems from the well-known fact that POD and Galerkin-projection based methods are severely limited in their ability to forecast on these simple traveling-wave systems [23, 31] and require special treatment with intrinsic knowledge of the flow dynamics. This is in addition to the fact that equation-based models are impossible to construct because of the absence of information from the other variables of the PDE. We seek to build predictive models solely from observations of conditioned on mimicking a real-world scenario where complete observations of all relevant variables (in this case, velocities) are unavailable.

3 Data-compression

3.1 Proper orthogonal decomposition

In this section, we review the POD technique for the construction of a reduced basis [20, 2]. The interested reader may also find an excellent explanation of POD and its relationship with other dimension-reduction techniques in [42]. The POD procedure is tasked with identifying a space

(7)

which approximates snapshots optimally with respect to the -norm. The process of generation commences with the collection of snapshots in the snapshot matrix

(8)

where is the number of snapshots, and corresponds to an individual snapshot in time of the discrete solution domain with the mean value removed, i.e.,

(9)

with

being the time-averaged solution field. Our POD bases can then be extracted efficiently through the method of snapshots where we solve the eigenvalue problem on the correlation matrix

. Then

(10)

where is the diagonal matrix of eigenvalues and

is the eigenvector matrix. Our POD basis matrix can then be obtained by

(11)

In practice a reduced basis is built by choosing the first columns of for the purpose of efficient ROMs, where . This reduced basis spans a space given by

(12)

The coefficients of this reduced basis (which capture the underlying temporal effects) may be extracted as

(13)

The POD approximation of our solution is then obtained via

(14)

where corresponds to the POD approximation to . The optimal nature of reconstruction may be understood by defining the relative projection error

(15)

which exhibits that with increasing retention of POD bases, increasing reconstruction accuracy may be obtained. We remark that for dimension , the solution variables may be stacked to obtain this set of bases that are utilized for the reduction of each PDE within the coupled system. Another approach may be to obtain reduced bases for each dependent variable within the coupled system and evolve each PDE on a different manifold. Each dependent variable is projected onto bases constructed from its snapshots alone. This affects the computation of the nonlinear term for computing the updates for each dimension in . In practice, this operation manifests itself in the concatenation of reduced bases to obtain one linear operation for reconstruction of all field quantities.

3.2 Convolutional autoencoders

Figure 2: Network architecture shows the encoder, bottleneck and decoder of a typical convolutional autoencoder (CAE). The input image is reconstructed using convolution, pooling and upsampling/unpooling layers.

Autoencoders are neural networks that learn a new representation of the input data, usually with lower dimensionality. The initial layers, called the encoder, map the input to a new representation . The remaining layers, called the decoder, map back to with the goal of reconstructing . The objective is to minimize the reconstruction error. Autoencoders are unsupervised; the data is given, but the representation must be learned.

More specifically, we use convolutional autoencoders (CAEs) that have convolutional layers. In a convolutional layer, instead of learning a matrix that connects all neurons of layer’s input to all neurons of the layer’s output, we learn a set of filters. Each filter is convolved with patches of the layer’s input. Suppose a one-dimensional convolutional layer has filters of length . Then each of the layer’s output neurons corresponding to filter is connected to a patch of of the layer’s input neurons. In particular, a one-dimensional convolution of filter and patch is defined as (for neural networks, convolutions are usually technically implemented as cross-correlations). Then, for a typical one-dimensional convolutional layer, the layer’s output neuron , where

is an activation function and

are the entries of a bias term. As

increases, patches are shifted by stride

. For example, a one-dimensional convolutional layer with a filter of length and stride could be defined so that involves the convolution of and inputs , and . To calculate the convolution, it is common to add zeros around the inputs to a layer, which is called

zero padding

. In the decoder, we use deconvolutional layers to return to the original dimension. These layers upsample with nearest-neighbor interpolation.

Two-dimensional convolutions are defined similarly, but each filter and each patch are two-dimensional. A two-dimensional convolution sums over both dimensions, and patches are shifted both ways. For a typical two-dimensional convolutional layer, the output neuron . Input data can also have a “channel” dimension, such as red, green and blue for images. The convolutional operator sums over channel dimensions, but each patch contains all of the channels. The filters remain the same size as patches, so they can have different weights for different channels. In our study, we use solely one channel for the spatial magnitude of .

It is common to follow a convolutional layer with a pooling

layer, which outputs a sub-sampled version of the input. In this paper, we specifically use max-pooling layers. Each output of a max-pooling layer is connected to a patch of the input, and it returns the maximum value in the patch.

3.3 Variational autoencoders

As opposed to CAEs, the variational autoencoder (VAE) [18] takes a Bayesian approach to latent space modeling. We utilize a convolutional VAE architecture, where outer convolutional, pooling and upscaling layers are identical to CAE. The difference only arises in the bottleneck, where the encoder final layers are fully connected layers that project the input onto the latent-variable space. Thus, the encoder effectively is a function . The fully connected part of the decoder network generates newly sampled

from the latent space. The output of this undergoes upsampling and convolutions (similar to that of the CAE) to reconstruct the image. The VAE also constrains the latent-space variables to follow a normal distribution

via the Kullback–Leibler divergence (KL divergence) term

. The architecture of VAE (shown in Figure 3) is similar to that of the conventional CAE shown in Figure 2, except at the bottleneck, where the encoder network outputs the mean

and variance

.

The inference itself is undertaken using variational inference (VI) by modeling the true distribution

using a simple Gaussian distribution, and then minimizing the difference via the KL divergence as an addition to the loss function

. The KL-divergence loss is applied such that the distribution on is as close to the normal distribution as possible.

With the inclusion of additional complexity to the loss function and the bottleneck, VAEs have more parameters to tune than CAEs. However, this also gives a significant control over the latent-space distribution. Depending on the data, dimensionality reduction using CAEs may not allow for a straightforward interpolation due to the presence of discontinuous clusters in the latent-space representation. On the other hand, VAEs constrain the latent-space representation to follow a pre-specified distribution, hence by design, facilitate easier interpolation.

Figure 3: Convolutional VAE architecture: Convolutional parts of the encoder and decoder are similar to that of a convolutional autoencoder. The bottleneck includes a mapping onto a normal distribution of latent-space variables. Sampling from this space is performed and decoded for generation of new data.

4 Gaussian process regression

Dimensionality reduction performed using POD, CAE or VAE techniques results in a representation of the original data, along with a model (the POD bases or decoders) to reconstruct the data for any point in the latent space. We then only require a temporal interpolation scheme fitted on the representation space, so that a time evolution of the dynamical system can be reconstructed, as schematically shown in Fig 1. In our approach, we deploy the use of GPs [48] as our interpolation algorithm. While GPs perform Bayesian regression tasks with a high level of interpretabilty, they are generally computationally expensive for large data sizes. We achieve a considerable reduction of computational cost by fitting in the space of reduced dimensions. In addition, the GPR may also be restricted to a less noisy space compared with the original dataset of the dynamical evolution.

A GP is an accumulation of random variables, every finite collection of which follows a multivariate Gaussian distribution. It can be perceived as a generalization of a multivariate Gaussian distribution with infinite space. GPs are a popular choice for non-linear regression due to their flexibility and ease of use. In addition, one of their main advantages, is that they incorporate a principled way of measuring the uncertainty information, since they provide predictions in distributional form. For the purpose of this paper, a GPR model is used to fit the reduced space from the data-compression algorithms of Section 

3

. Subsequently, the mean prediction, which corresponds to the maximum a posteriori (MAP) estimate, is used for the reconstructions. We use the GPflow library for the experiments

[10].

A GP can be completely specified by its second-order statistics. Thus, a mean function equal to zero can be assumed, and a positive definite covariance function (kernel) , which can be perceived as a measure of similarity between and , is the only requirement to specify the GP.

For the experiments in Section 5 we initially experimented with changepoint kernels [25]. The intuition came from the data (see e.g. Figure 11), where two typical behaviours are commonly observed; a steep increase or decrease of the latent dimension values for early times, and a subsequent, smoother change in direction, that eventually leads to stabilization. At first we examined a changepoint kernel only for the time feature, which was then added to a regular kernel that accounted for the other variables. The results were discouraging, which we attribute to the fact that this kernel structure leads to loss of correlational information between time and the other variables. Subsequently, we examined changepoint kernels that accounted for all parameters in both their sub-kernels. Even though this type of kernel was successful in producing acceptable results, and also detecting adequately the position of the changepoint, the output was only a slight improvement when compared to a standard kernel. Furthermore, the computational time was substantially larger, which led us to abandon the pursuit of a changepoint kernel. Instead, we settled on a Matérn 3/2 kernel,

(16)

due to its versatility, flexibility and smoothness, and more specifically its automatic relevance determination (ARD) extension [3], which incorporates a separate parameter for each input variable and gave a significant improvement in our results.

For a GPR model, we considered a GP and noisy training observations of datapoints , derived from the true values with additive i.i.d. Gaussian noise with variance . In mathematical form, that is:

(17)

where is the kernel. We obtain the complete specification of the GP, by maximizing the marginal likelihood, which we can acquire by integrating the product of the Gaussian likelihood and the GP prior over :

(18)

For testing input and testing output , we derive the joint marginal likelihood:

where

is the identity matrix.

Finally, by conditioning the joint distribution on the training data and the testing inputs, we derive the predictive distribution

(19)

where

(20)

During the reconstruction phase, we focus on the predictions that correspond to .

5 Experiments

In this section, we outline several experiments designed to assess the various compression frameworks and how they interface with latent-space emulation using our aforementioned GPs. A first series of assessments is solely targeted at assessing the fidelity of reconstruction (i.e., which framework offers the most efficient compression). Following this, we interface latent-space representations of our compressed fields with GP emulators to obtain low-dimensional surrogates with embedded uncertainty quantification. Finally, we outline the ability for the GP emulators to predict the dynamics’ evolution at finer temporal resolutions.

For the purpose of training the compression frameworks and the GP emulators, we generate forward-model solves which are obtained by a Latin hypercube sampling of different initial conditions between   and   for each dimension and . For each of these simulations, evenly-spaced snapshots in time are obtained to construct our total data set (i.e., we have flow-fields for of resolution ). We remind the reader that the equation-based simulations require the solution of a system of PDEs (for , and ), but our emulators will be built from information alone. We split the simulations into for training, for validation and for testing. The validation data set is primarily utilized for early stopping criteria in the deep neural network autoencoders. Note that the training and validation data are combined into one data set for training GP emulators. All statistical and qualitative assessments in the following will be shown for the testing data alone.

5.1 Reconstruction

We begin by assessing the ability of our different compression frameworks, i.e., the POD, CAE and VAE. This comparison is obtained by training multiple encoders with varying degrees of freedom (DOF) in the latent space. The results of these experiments can be seen in Table

1. Here, we have chosen ,   ,   ,   ,     and   DOF for all of our compression frameworks and have compared the fidelity of the reconstruction. We use metrics given by the coefficient of determination (), mean squared error (MSE), and mean absolute error (MAE) to compare the true and reconstructed fields for testing data sets. Our analysis of the metrics indicate that the CAE is able to reach optimal reconstruction accuracy faster than both the POD and VAE. Both the VAE and CAE are seen to possess an advantage over the POD, due to their ability to find a non-linear low-dimensional manifold. POD, instead, obtains a linear affine subspace of the high-dimensional system. Interestingly, with increasing DOF in latent space () the POD method is seen to outperform the VAE. We also note that the CAE and VAE frameworks obtain their peak accuracy at around 8 DOF in the latent space and proceed to saturate in accuracy with greater latent space dimensions. We remark that this aligns with the lack of any guarantees on convergence with increasing DOF for these nonlinear compression frameworks. POD, instead, is guaranteed to converge with increasing latent space dimensions.

Coefficient of determination
Model/Latent DOF 2 4 8 16 30 40
POD 0.10 0.30 0.55 0.69 0.82 0.87
CAE 0.37 0.87 0.91 0.88 0.91 0.91
VAE 0.35 0.66 0.86 0.83 0.82 0.79
Mean squared error
Model/Latent DOF 2 4 8 16 30 40
POD 0.0025 0.0021 0.0014 0.0010 0.00063 0.00045
CAE 0.0017 0.00034 0.00025 0.00031 0.00026 0.00025
VAE 0.0017 0.00084 0.00036 0.00043 0.00045 0.00052
Mean absolute error
Model/Latent DOF 2 4 8 16 30 40
POD 0.029 0.027 0.021 0.017 0.013 0.011
CAE 0.021 0.0090 0.0075 0.0083 0.0075 0.0076
VAE 0.023 0.015 0.0094 0.010 0.010 0.011
Table 1: Latent space compression and reconstruction metrics for testing data. The CAE is seen to obtain high accuracy with relatively few DOF in comparison to the POD and VAE.

Following these quantitative assessments, we assess the reconstruction fidelity of the different frameworks by comparing contours from the different methods with varying DOF in latent space. Figure 4 shows the performance for our three compression frameworks for four DOF in the latent space. At this coarse resolution (in latent space), the linear compression of POD is inadequate at capturing the coherent features in the solution field upon reconstruction. This is due to the advective nature of the dynamics of this data set and the associated high Kolmogorov width. In contrast, both CAE and VAE are able to identify coherent structures in the flow field after being reconstructed from a latent space. Note, however, that the CAE is able to identify the crests and troughs in the flow field in a more accurate manner. We observe similar results from an eight-dimensional latent space where the CAE and VAE are still seen to outperform POD (although improvements in the latter may be observed). For both cases (i.e., four and eight DOF), the CAE and VAE are seen to struggle with reconstructing the dissipating coherent features later in the evolution of the system. For completeness, we also show a result for a forty-dimensional latent space in Figure 6, where POD can be seen to capture the spatio-temporal trends of the true solution appropriately. Note that improvements in the CAE and VAE are marginal with the POD outperforming both these frameworks later in the evolution of system (at ).

(a) Reconstruction at
(b) Reconstruction at
(c) Reconstruction at
Figure 4: Reconstruction from true latent-space evolutions for a four-dimensional latent space for time (top), (middle), (bottom) for a testing initial condition. Qualitatively, for the same data set and similar architectures, the CAE outperforms the VAE. Note that both the VAE and CAE are superior to the POD.
(a) Reconstruction at
(b) Reconstruction at
(c) Reconstruction at
Figure 5: Reconstruction from true latent space evolutions for an eight-dimensional latent space for time (top), (middle), (bottom) for a testing initial condition. Qualitatively, for the same data set and similar architectures, the CAE outperforms the VAE. Note that both VAE and CAE are superior to the POD.
(a) Reconstruction at
(b) Reconstruction at
(c) Reconstruction at
Figure 6: Reconstruction from true latent space evolutions for a forty-dimensional latent space for time (top), (middle), (bottom) for a testing initial condition. Qualitatively, for the same data set and similar architectures, the CAE outperforms the VAE and is comparable to the POD encoding. This is because of the greater DOF in the latent space.

5.2 Latent space forecasts

Next we test the ability of our trained GP emulators to forecast the evolution of systems in their latent space representations. For this assessment, we choose trained encoders (with 4,8 and 40 latent space dimensions) to obtain training and validation data for fitting our previously introduced GPs. Once trained, the GPs are tasked with predicting the evolution of the latent state in reduced space for a set of test initial conditions.

Figure 7

shows the latent space evolution of a testing simulation over time for the three different compression methodologies. This result utilizes solely four latent dimensions. It is readily apparent that CAE compression leads to a smooth evolution of the system in latent space. In contrast, POD and VAE methods display significant oscillations. The GP is able to capture the behavior of the system evolution and also provides confidence intervals which are based on two standard deviations around the mean. With the exception of a few instances in time, the confidence intervals are able to envelope the true evolution of the system. We see similar results in Figure

8 where eight latent space DOF are obtained for each of our compression frameworks. While the POD is seen to provide oscillatory system evolution (as before), the CAE and VAE are smooth. In either case, the constructed GP is able to recover the evolution well.

(a) POD
(b) CAE
(c) VAE
Figure 7: Latent space forecasts on testing data for (a) POD, (b) CAE and (c) VAE with a reduced dimension of 4 DOF. The shaded areas indicate confidence intervals from the probabilistic forecasts.
(a) POD
(b) CAE
(c) VAE
Figure 8: Latent space forecasts on testing data for (a) POD, (b) CAE and (c) VAE with a reduced dimension of 8 DOF. The shaded areas indicate confidence intervals from the probabilistic forecasts.

5.3 Reconstruction from latent space forecasts

We proceed by assessing the fidelity of the reconstructions from latent space using the GP forecasts of the previous subsection. Qualitative comparisons for the reconstruction of a test simulation are shown in Figure 9 for a four-dimensional latent space at three different times. The CAE and VAE compression is seen to outperform the linear reduced-basis constructed by POD. This aligns with past studies where nonlinear compression methods have outperformed the POD. The CAE is seen to be more accurate than the VAE due to its deterministic formulation during compression. We observe the CAE and VAE to outperform the POD for the eight-dimensional latent space as well, as shown in Figure 10. Qualitatively, the CAE is seen to be the best compression technique of all the methods. Table 2 shows different metrics to establish these conclusions quantitatively. Note that all these metrics are evaluated on the reconstructed data in physical space.

Coefficient of determination
Model/DOF 4 8
POD –3.89 –1.99
CAE 0.86 0.87
VAE 0.63 0.82
Mean squared error
Model/DOF 4 8
POD 0.034 0.0041
CAE 0.00032 0.00029
VAE 0.00086 0.00046
Mean absolute error
Model/DOF 4 8
POD 0.030 0.044
CAE 0.0085 0.0076
VAE 0.014 0.010
Table 2: Metrics obtained via GP-based forecast in latent space followed by reconstruction for testing data.
(a) GP Reconstructions
(b) GP Reconstructions
(c) GP Reconstructions
Figure 9: Reconstruction from GP forecasts for a four-dimensional latent space for time (a) , (b) and (c) for a testing initial condition. Qualitatively, for the same data set and similar architectures, the CAE outperforms the VAE. Note that both the VAE and CAE are superior to the POD.
(a) GP Reconstructions
(b) GP Reconstructions
(c) GP Reconstructions
Figure 10: Reconstruction from GP forecasts for an eight-dimensional latent space for time (a) , (b) and (c) for a testing initial condition. Qualitatively, for the same data set and similar architectures, the CAE outperforms the VAE. Note that both VAE and CAE are superior to the POD.

5.4 Temporal super-resolution

It must be highlighted that ML-based time-series forecasting methods are generally formulated in a discrete fashion where the temporal resolution of the training data determines the resolution of the ROM deployment. Also, most studies of ML-based forecasting in time assume a regular sampling of the state in time. In practice, due to simulation or experimental limitations, state information may be available sparsely in time and at irregular intervals. Therefore, the construction of a ROM that is continuous in the temporal variable allows us to sample at intermediate time steps with quantified uncertainty. Therefore. we test the ability of our parameterized non-intrusive ROM for interpolation in time. In this assessment, we establish the performance of the ROM to sample at locations (in time) that were not obtained in the discrete training and testing data. This is made possible due to the continuous function approximation property of the GPR.

To assess this capability, we re-generate our testing data which is now sampled times more finely in the temporal dimension. We utilize our pretrained CAE (solely trained on the coarsely sampled training data), to compress this system evolution to an eight-DOF latent space. Following this, our previously trained GP is tasked with sampling at the intermediate points in time, which correspond to this finely sampled testing data. Note that the GP is trained only with the coarse data set as well. Thus, this assessment represents interpolation in both space and time. The results for the GP forecast in this assessment are shown in Figure 11. A good agreement can be observed between the true latent space trajectory and the GP interpolated counterpart. We also direct our attention to the forecast behavior for latent dimensions ,     and   where uncertainties are seen to oscillate corresponding to the coarsely sampled training points. We qualitatively assess the accuracy of the temporal interpolation as shown in Figure 12, where the reconstruction from the true latent space trajectory and the GP interpolation are compared against the truth. A good agreement is seen. Naturally, the Root Mean Square Error (RMSE) and MAE for this finely sampled test data set is seen to be higher that the case for interpolating solely on the coarser sample locations (see Table 2 for comparison), but this adds a useful utility to this ROM strategy.

Figure 11:

Latent space interpolation for a testing data set with finer resolution in time. The purpose of this assessment is to determine the proposed framework’s viability for super-resolution in time. While all the training data is obtained by sampling a simulation only

times over the course of its evolution, the assessments are tested against a simulation that has been sampled times.
(a)
(b)
(c)
Figure 12: True fields (left), CAE reconstructions (middle) and GP-interpolated reconstructions (right) for a testing simulation that has been sampled times more finely. Note that the shown instances in time are not the usual sampling points utilized for our CAE or GP training.
Metric MAE RMSE
GP Interpolation+reconstruction 0.0175 0.00094
Reconstruction 0.0084 0.00022
Table 3: Metrics for the temporal super-resolution of testing simulations. A CAE and GP combination trained on coarsely sampled data in time is utilized to reconstruct at finer intervals

6 Conclusions

The development of parameteric non-intrusive reduced-order models for advective PDE systems has great applications for cost reductions in large numerical simulation campaigns across multiple domain sciences. This article addresses their limitations associated with reduced interpretability by proposing the use of GPRs, conditioned on time and system control parameters that provide quantification of uncertainty. This is particularly useful when nonlinear compression techniques, such as autoencoders, are used for efficient DOF reduction. In addition to the ability to interpolate in initial condition space, we also investigate the ability of the proposed framework to interpolate in time. This addresses the fact that the sampling of a dynamical system for training these ROMs may not match the emulation temporal resolution requirement. We also remark that the modular nature of the compression and time evolution allows for the use of conventional reduced-basis methods such as the POD for dynamics which are intrinsically low-dimensional.

Our results indicate that the proposed model-order reduction technique is successful at dealing with advective dynamics through assessments on the inviscid shallow-water equations. We establish this by testing on unseen initial conditions for our system evolution where a low-dimensional evolution successfully replicates high-dimensional results when coupled with a convolutional or variational autoencoder. The non-intrusive nature of our framework also allows for construction of emulators from remote-sensing or experimental data. This is of value when the underlying governing PDEs are not known a priori. Extensions to the present study shall investigate the integration of a feedback loop to sample points in control parameter space with the knowledge of prediction uncertainty. Through this, we aim to establish continually-learning model-order reduction frameworks for advective problems spanning large physical regimes.

7 Acknowledgements

RM acknowledges support from the Margaret Butler Fellowship at the Argonne Leadership Computing Facility. TB, LRM, IP acknowledge support from Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC Grant EP/T001569/1, particularly the Digital Twins for Complex Engineering Systems theme within that grant and The Alan Turing Institute. IP acknowledges funding from the Imperial College Research Fellowship scheme. This material is based upon work supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. This research was funded in part and used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. DOE or the United States Government. Declaration of Interests - None.

References

  • [1] S. E. Ahmed, O. San, A. Rasheed, and T. Iliescu (2020) A long short-term memory embedding for hybrid uplifted reduced order models. Physica D: Nonlinear Phenomena, pp. 132471. Cited by: §1.
  • [2] G. Berkooz, P. Holmes, and J. L. Lumley (1993) The proper orthogonal decomposition in the analysis of turbulent flows. Annual Review of Fluid Mechanics 25 (1), pp. 539–575. Cited by: §3.1.
  • [3] C. M. Bishop (2006) Pattern recognition and machine learning. springer. Cited by: §4.
  • [4] T. Bui-Thanh, M. Damodaran, and K. Willcox (2004) Aerodynamic data reconstruction and inverse design using proper orthogonal decomposition. AIAA journal 42 (8), pp. 1505–1516. Cited by: §1.
  • [5] K. Carlberg, M. Barone, and H. Antil (2017) Galerkin v. least-squares petrov–galerkin projection in nonlinear model reduction. Journal of Computational Physics 330, pp. 693–734. Cited by: §1.
  • [6] K. Carlberg, C. Bou-Mosleh, and C. Farhat (2011)

    Efficient non-linear model reduction via a least-squares Petrov–Galerkin projection and compressive tensor approximations

    .
    Int. J. Numer. Meth. Eng. 86 (2), pp. 155–181. Cited by: §1.
  • [7] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton (2019) Data-driven discovery of coordinates and governing equations. arXiv preprint arXiv:1904.02107. Cited by: §1.
  • [8] A. Chattopadhyay, P. Hassanzadeh, and S. Pasha (2020) Predicting clustered weather patterns: a test case for applications of convolutional neural networks to spatio-temporal climate data. Scientific Reports 10 (1), pp. 1–13. Cited by: §1.
  • [9] A. Chattopadhyay, E. Nabizadeh, and P. Hassanzadeh (2019) Analog forecasting of extreme-causing weather patterns using deep learning. arXiv preprint arXiv:1907.11617. Cited by: §1.
  • [10] A. G. De G. Matthews, M. Van Der Wilk, T. Nickson, K. Fujii, A. Boukouvalas, P. León-Villagrá, Z. Ghahramani, and J. Hensman (2017)

    GPflow: a gaussian process library using tensorflow

    .
    The Journal of Machine Learning Research 18 (1), pp. 1299–1304. Cited by: §4.
  • [11] F. Fang, C. C. Pain, I. Navon, A. H. Elsheikh, J. Du, and D. Xiao (2013) Non-linear petrov–galerkin methods for reduced order hyperbolic equations and discontinuous finite element methods. Journal of Computational Physics 234, pp. 540–559. Cited by: §1.
  • [12] F. J. Gonzalez and M. Balajewicz (2018) Learning low-dimensional feature dynamics using deep convolutional recurrent autoencoders. arXiv preprint arXiv:1808.01346. Cited by: §1.
  • [13] J. S. Hesthaven and S. Ubbiali (2018) Non-intrusive reduced order modeling of nonlinear problems using neural networks. Journal of Computational Physics 363, pp. 55–78. Cited by: §1.
  • [14] W. W. Hsieh and B. Tang (1998) Applying neural network models to prediction and data analysis in meteorology and oceanography. Bulletin of the American Meteorological Society 79 (9), pp. 1855–1870. Cited by: §1.
  • [15] I. Kalashnikova and M. Barone (2010) On the stability and convergence of a galerkin reduced order model (rom) of compressible flow with solid wall and far-field boundary treatment. International journal for numerical methods in engineering 83 (10), pp. 1345–1375. Cited by: §1.
  • [16] V. L. Kalb and A. E. Deane (2007) An intrinsic stabilization scheme for proper orthogonal decomposition based low-dimensional models. Phys. Fluids 19 (5), pp. 054106. Cited by: §1.
  • [17] B. Kim, V. C. Azevedo, N. Thuerey, T. Kim, M. Gross, and B. Solenthaler (2019) Deep fluids: a generative network for parameterized fluid simulations. In Computer Graphics Forum, Vol. 38, pp. 59–70. Cited by: §1.
  • [18] D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §3.3.
  • [19] M. Korda, M. Putinar, and I. Mezić (2018) Data-driven spectral analysis of the koopman operator. Applied and Computational Harmonic Analysis. Cited by: §1.
  • [20] D. Kosambi (1943) Statistics in function space. J Indian Math. Soc. 7, pp. 76–88. Cited by: §3.1.
  • [21] J. Kou and W. Zhang (2017) Layered reduced-order models for nonlinear aerodynamics and aeroelasticity. Journal of Fluids and Structures 68, pp. 174–193. Cited by: §1.
  • [22] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor (2016) Dynamic mode decomposition: data-driven modeling of complex systems. SIAM. Cited by: §1.
  • [23] J. N. Kutz, X. Fu, and S. L. Brunton (2016) Multi-resolution dynamic mode decomposition. SIAM Journal of Applied Dynamical Systems 15 (2), pp. 713–735. Cited by: §2.
  • [24] K. Lee and K. T. Carlberg (2020) Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. J. Comp. Phys. 404, pp. 108973. Cited by: §1.
  • [25] J. R. Lloyd, D. Duvenaud, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani (2014) Automatic construction and natural-language description of nonparametric regression models. stat 1050, pp. 24. Cited by: §4.
  • [26] Z. Long, Y. Lu, X. Ma, and B. Dong (2017) PDE-net: Learning PDEs from data. arXiv preprint arXiv:1710.09668. Cited by: §1.
  • [27] A. Mannarino and P. Mantegazza (2014)

    Nonlinear aeroelastic reduced order modeling by recurrent neural networks

    .
    Journal of Fluids and Structures 48, pp. 103–121. Cited by: §1.
  • [28] R. Maulik, R. Egele, B. Lusch, and P. Balaprakash (2020) Recurrent neural network architecture search for geophysical emulation. arXiv preprint arXiv:2004.10928. Cited by: §1.
  • [29] R. Maulik, B. Lusch, and P. Balaprakash (2020) Non-autoregressive time-series methods for stable parametric reduced-order models. arXiv preprint arXiv:2006.14725. Cited by: §1.
  • [30] R. Maulik, A. Mohan, B. Lusch, S. Madireddy, P. Balaprakash, and D. Livescu (2020) Time-series learning of latent-space dynamics for reduced-order model closure. Physica D: Nonlinear Phenomena, pp. 132368. Cited by: §1.
  • [31] A. Mendible, S. L. Brunton, A. Y. Aravkin, W. Lowrie, and J. N. Kutz (2019) Dimensionality reduction and reduced order modeling for traveling wave physics. arXiv preprint arXiv:1911.00565. Cited by: §2.
  • [32] A. Mohan, D. Daniel, M. Chertkov, and D. Livescu (2019) Compressed Convolutional LSTM: An Efficient Deep Learning framework to Model High Fidelity 3D Turbulence. arXiv preprint arXiv:1903.00033. Cited by: §1.
  • [33] A. T. Mohan and D. V. Gaitonde (2018) A deep learning based approach to reduced order modeling for turbulent flow control using LSTM neural networks. arXiv preprint arXiv:1804.09269. Cited by: §1.
  • [34] M. Mohebujjaman, L. G. Rebholz, and T. Iliescu (2019) Physically constrained data-driven correction for reduced-order modeling of fluid flows. Int. J. Numer. Meth. Fl. 89 (3), pp. 103–122. Cited by: §1.
  • [35] B. R. Noack, M. Morzynski, and G. Tadmor (2011) Reduced-order modelling for flow control. Vol. 528, Springer Science & Business Media. Cited by: §1.
  • [36] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.
  • [37] M. Raissi (2018) Deep hidden physics models: deep learning of nonlinear partial differential equations. The Journal of Machine Learning Research 19 (1), pp. 932–955. Cited by: §1.
  • [38] S. A. Renganathan, Y. Liu, and D. N. Mavris (2018) Koopman-based approach to nonintrusive projection-based reduced-order modeling with black-box high-fidelity models. AIAA Journal 56 (10), pp. 4087–4111. Cited by: §1.
  • [39] O. San and J. Borggaard (2014) Basis selection and closure for pod models of convection dominated boussinesq flows. In 21st International Symposium on Mathematical Theory of Networks and Systems, Vol. 5. Cited by: §1.
  • [40] O. San and R. Maulik (2018) Neural network closures for nonlinear model order reduction. Advances in Computational Mathematics 44 (6), pp. 1717–1750. Cited by: §1.
  • [41] J. Sirignano and K. Spiliopoulos (2018) DGM: a deep learning algorithm for solving partial differential equations. Journal of Computational Physics 375, pp. 1339–1364. Cited by: §1.
  • [42] K. Taira, M. S. Hemati, S. L. Brunton, Y. Sun, K. Duraisamy, S. Bagheri, S. Dawson, and C. Yeh (2019) Modal analysis of fluid flows: applications and outlook. arXiv preprint arXiv:1903.05750. Cited by: §3.1.
  • [43] N. Thuerey, K. Weißenow, L. Prantl, and X. Hu (2019) Deep learning methods for Reynolds-Averaged Navier–Stokes simulations of airfoil flows. AIAA J, pp. 1–12. Cited by: §1.
  • [44] R. Verstappen and A. Veldman (1997) Direct numerical simulation of turbulence at lower costs. Journal of Engineering Mathematics 32 (2-3), pp. 143–159. Cited by: §1.
  • [45] P. R. Vlachas, W. Byeon, Z. Y. Wan, T. P. Sapsis, and P. Koumoutsakos (2018) Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 474 (2213), pp. 20170844. Cited by: §1.
  • [46] Q. Wang, N. Ripamonti, and J. S. Hesthaven (2020) Recurrent neural network closure of parametric pod-galerkin reduced-order models based on the mori-zwanzig formalism. Journal of Computational Physics, pp. 109402. Cited by: §1.
  • [47] Z. Wang, I. Akhtar, J. Borggaard, and T. Iliescu (2012) Proper orthogonal decomposition closure models for turbulent flows: a numerical comparison. Computer Methods in Applied Mechanics and Engineering 237, pp. 10–26. Cited by: §2.
  • [48] C. K. Williams and C. E. Rasmussen (2006) Gaussian processes for machine learning. Vol. 2, MIT press Cambridge, MA. Cited by: §4.
  • [49] D. Xiao, F. Fang, J. Du, C. Pain, I. Navon, A. Buchan, A. H. Elsheikh, and G. Hu (2013) Non-linear petrov–galerkin methods for reduced order modelling of the navier–stokes equations using a mixed finite element pair. Computer Methods In Applied Mechanics and Engineering 255, pp. 147–157. Cited by: §1.
  • [50] J. Xu and K. Duraisamy (2019) Multi-level convolutional autoencoder networks for parametric prediction of spatio-temporal dynamics. arXiv preprint arXiv:1912.11114. Cited by: §1.
  • [51] W. Zhang, J. Kou, and Z. Wang (2016) Nonlinear aerodynamic reduced-order model for limit-cycle oscillation and flutter. AIAA Journal, pp. 3304–3311. Cited by: §1.