1. Introduction
Highfidelity computational models are indispensable tools for the prediction and analysis of physical systems which are governed by parameterized partial differential equations (PDEs). As more and more industries are relying on simulated results to inform their daily operations, the significant amount of computational resources demanded by such models is becoming increasingly prohibitive. Indeed, actions which increase model fidelity such as refining the spatiotemporal resolution can also lead to an explosion of dimensionality, making use of the fullorder model (FOM) infeasible in realtime or manyquery scenarios. To remedy this, emphasis has been placed on reducedorder models (ROMs) which approximate the highfidelity, fullorder models at any desired configuration of parameters. Indeed, an appropriately built ROM can enable important applications such as uncertainty quantification or predictive control, which respectively require access to the FOM solution at thousands of points or nearinstantaneous access to approximate solutions.
The governing assumption inherent in the design of ROMs is that the FOM solution lies in a submanifold of the ambient space which is intrinsically lowdimensional. This is reasonable to posit when the FOM solution has been generated from the semidiscretization of an evolving quantity over a dense set of nodal points, as the values of this quantity in each dimension are strongly determined by the original (presumably lowerdimensional) dynamics. When this is the case, efficient approximation of the solution submanifold can lead to massive decreases in simulation time with relatively small approximation errors, since restriction to this space is lossless in principle. On the other hand, the problem of computing this submanifold is generally highly nontrivial, which has led to a wide variety of ROMs appearing in practice.
To enable this speedup in approximation of the FOM solution, ROMs are divided into offline and online stages. The offline stage is where the majority of expensive computation takes place, and may include operations such as collecting snapshots of the highfidelity model, generating a reduced basis, or training a neural network. On the other hand, an effective ROM has an online stage that is designed to be as fast as possible, involving only a small amount of function evaluations or the solution of small systems of equations. Classically, most ROMs have relied on some notion of reduced basis to accomplish this task, which is designed to capture all of the important information about the highfidelity system. For example, the well studied method of proper orthogonal decomposition (POD) described in Section 2 uses a number of solution snapshots to generate a reduced basis of size whose span has minimal reconstruction error. The advantage of this is that the highfidelity problem can then be projected into the reduced space, so that only its lowdimensional representation must be manipulated at solving time. On the other hand, POD like many reducedbasis methods is a fundamentally linear technique, and therefore struggles in the presence of advectiondominated phenomena where the solution does not decay over time.
1.1. Related Work
Desire to break the accuracy barrier of linear methods along with the rapid advancement of neural network technology has reignited interest in machine learning methods for model order reduction. In particular, it was first noticed in Milano and Koumoutsakos
[milano2002]that the reduced basis built with a linear multilayer perceptron (MLP) model is equivalent to POD, and that nonlinear MLP offers improved prediction and reconstruction abilities at additional computational cost. Later, this idea was extended in works such as
[kashima2016, hartman2017] which established that a fully connected autoencoder architecture can be trained to provide a nonlinear analogue of projection which often realizes significant performance gains over linear methods such as POD.Following the mainstream success of largescale parameter sharing network architectures such as convolutional neural networks, the focus of nonlinear ROMs began to shift in this direction. Since many highfidelity PDE systems such as largescale fluid simulations involve a huge number of degrees of freedom, the ability to train an effective network at a fraction of the memory cost for a fully connected network is very appealing and led to several important breakthroughs. The work of Lee and Carlberg in
[lee2020] demonstrated that deep convolutional autoencoders (CAEs) can overcome the Kolmogorov width barrier for advectiondominated systems which limits linear ROM methods, leading to a variety of similar ROMs based on this architecture seen in e.g. [fukami2020, eivazi2020, fresca2021, maulik2021]. Moreover, works such as [eivazi2020, fresca2021] have experimented with entirely datadriven ROMs based on deep CAEs, and seen success using either fully connected or recurrent long short term networks to simulate the reduced lowdimensional dynamics. At present, there are a large variety of nonlinear ROMs based on fully connected and convolutional autoencoder networks, and this remains an active area of research.Despite the now widespread use of autoencoder networks for reducedorder modeling, there have not yet been studies which compare the performances of fully connected and convolutional networks in a ROM setting. In fact, many highfidelity simulations take place on domains which are unstructured and irregular, so it is somewhat surprising that CAEs still perform well for this type of problem (see e.g. [fresca2021, Remark 5]). On the other hand, there is clearly a conceptual issue with applying the standard convolutional neural network (c.f. Section 2) to input data with variable connectivity; since the convolutional layers in this architecture accept only rectangular data, the input must be reshaped in a way which can change its neighborhood relationships, meaning the convolution operation may no longer be localized in space. This is a known issue which has been thoroughly considered in the literature on graph convolutional neural networks (see e.g. [wu2020] and references therein), which have been successful in a wide range of classification tasks.
1.2. Contributions
The goal of this work is to augment the ROM literature with a comparison between autoencoderROM architectures based on fully connected and convolutional neural networks. In particular, we compare the most popular deep convolutional autoencoder architecture from e.g. [lee2020, fresca2021]
to two alternatives: a basic fully connected autoencoder with batch normalization
[ioffe2015], and a novel autoencoder based on the GCNII graph convolution operation of [chen2020]. Through benchmark experiments on regular and irregular data, we study the effectiveness of each architecture in compressing/decompressing highfidelity PDE data, and evaluate its performance as part of a datadriven ROM. Results show that the superior architecture is highly dependent on the size of the reduced latent space as well as the modeling task involved. Moreover, some of the assumed advantages of the CAE (such as reduced memory consumption and easier training) are not satisfied in every case, and for some applications it is advantageous to use one of the other available architectures. It is our hope that the experiments provided here will enable modelers to make a more informed choice regarding which network architecture they employ for their autoencoderbased ROM.2. Preliminaries
Consider a fullorder model which is a dynamical system of parameterized ODEs,
(1) 
Here is time (),
is the vector of model parameters,
is the parameterized state function with initial value , and is the state velocity. In practice, models (1) arise naturally from the semidiscretization of a parameterized system of PDE, where the discretization of the domain creates an length vector (or array) representing a quantity defined at the nodal points. Clearly, a semidiscretization of size has the potential to create a large amount of redundant dimensionality, as a unique dimension is created for every nodal point while the solution remains characterized by the same number of parameters.To effectively handle the highdimensional ODE (1), the overarching goal of reducedorder models is to predict approximate solutions without solving the governing equations in the highdimensional space . In principle, this is possible as long as the assignment is unique for each , meaning that the space of solutions spans a lowdimensional manifold in the ambient (c.f. Figure 1). Therefore, the ROM should provide fast access to the solution manifold
which contains integral curves of the highdimensional FOM. Note that the uniqueness assumption on implies that the coordinates form a global chart for and hence is intrinsically dimensional (at most), which is typically much less than the ambient dimension .
Traditionally, many ROMs are generated through some form of Proper Orthogonal Decomposition (POD). To apply POD, some number of solutions to (1) are first collected and stored in a snapshot matrix where is the number of training samples, , represents the component of the solution , and nonuniqueness is allowed in the parameters
. A singular value decomposition of the snapshot matrix
is then performed, and the first columns of (corresponding to the largest eigenvalues of ) are chosen to serve as a reduced basis for the solution space. Geometrically, this procedure amounts to choosing as basis an plane of POD modes in which minimizes the projection error over the snapshot data. A ROM can then be generated from this by projecting (1) to this plane. More precisely, writing the approximate solution for some and using the FOM along with the fact that leads to the systemwhich requires the solution of only equations.
POD based techniques are known to be highly effective in many cases, especially when the problem dynamics are diffusiondominated. On the other hand, advectiondominated problems exhibit a slowly decaying Kolmogorov nwidth
meaning a very large number of POD modes is required for accurate approximation of their solutions. This reveals a fundamental issue with linear approximation spaces which motivated recent study into neural network based ROMs. By now, it is well known that the barrier posed by can be surmounted using deep convolutional autoencoders, which are now reviewed.
2.1. Convolutional Autoencoder Networks
Recall that a fully connected neural network (FCNN) is a function of the input vector such that at layer ,
(2) 
where
is a nonlinear activation function acting rowwise on its input,
is the weight matrix of the layer and is its bias. FCNNs are extremely expressive due to their large amount of parameters, but for the same reason they are typically difficult to train and incur a high memory cost. On the other hand, a convolutional neural network (CNN) can be considered a special case of FCNN where weights are shared and determined by a translationinvariant kernel function. More precisely, a convolutional neural network is a function of a multidimensional input such that at layer ,where denotes the spatiallyindependent “channel dimension” of , is the convolution kernel at layer , and is a bias object which is constant for each fixed . Note that the kernel has dimension equal to that of the input plus one, since it depends on the shape of as well as the number of channels at layers . Here plays the role of the discrete convolution operator, defined for twodimensional, onechannel arrays as
where
is the stride length,
, and all Greek indices are compactly supported and begin at zero. Note from this definition of the discrete convolution that the ranges of in the output depend on the constantsas well as the shape of the input. To determine this more precisely, consider projecting the tensor
along some coordinate direction and computing the length of this projection in terms of the lengths of the corresponding projections of . If the projection of along this direction has length and the projection of has length , then it is straightforward to show [dumoulin2016, Relationship 5] that the output projection has length . Therefore, convolution will naturally decrease the dimensionality of the input data when, creating a downsampling effect which is highly useful for techniques such as image compression. Note that this downsampling can be further controlled by padding the input
with an array of zeros, in which case the size of the output changes based on other calculable rules (see e.g. [dumoulin2016]).Since the discrete convolution (2) decreases the dimensionality of the input data in a controlled way, it is reasonable to make use of it when constructing convolutional autoencoder (CAE) networks. In particular, the main idea behind CAE networks is to stack convolutional layers in such a way that the input is successively downsampled (or encoded) to a lowdimensional representation, and then upsampled (or decoded) back to its original form. Combined with an appropriate notion of reconstruction error to be minimized, this encourages the neural network to learn an approximation to the identity mapping which is useful for tasks such as compressed sensing and PDE approximation provided the decoder can be decoupled from the encoder . In the context of ROM, the notion of data downsampling serves as a nonlinear form of projection analogous to the truncated eigenvalue decomposition in POD. In principle, as more characteristic features are extracted by the convolutional layers in the network, less dimensionality is required to encode the original solution . Therefore, repeated convolution enables the most basic features of to be represented with a latent space of small dimension, which can be exploited by the network for accurate reconstruction of the solution manifold. Moreover, the assumption of uniqueness in the solution mapping guarantees that this idea is lossless whenever the latent dimension is at least as large as .
On the other hand, if convolution is to be useful for ROM, it is necessary to have a way to upsample (or decode) the latent representation for . Although the convolution operation is clearly not invertible, it can be transposed in a way which is useful for this purpose. In particular, note that discrete convolution can be considered a matrix operation by lexicographically “unrolling” the inputs into a vector and similarly unrolling the kernel into a sparse matrix which acts on that vector. The operator then has a well defined transpose, which effectively broadcasts the (unrolled) lowdimensional input to a highdimensional output which can be rolled back into its original shape. In practice, this operation is accomplished by simply exchanging the forward and backward passes of the relevant convolutional layer, as it can be shown (see [dumoulin2016, Section 4]) that this is operationally equivalent to computing the transposed convolution. Doing this several times in succession produces the required upsampling effect, and traditionally makes up the decoder part of the CAE structure.
2.2. Autoencoderbased ROMs
As mentioned before, using a well trained CAE in place of the linear reconstruction when designing a ROM can provide a huge benefit to accuracy. In particular, examples from [lee2020, fresca2021] show that deep CAEs can approach the theoretical minimum error with only a few latent variables, and enable the same (or better) accuracy as POD with a much smaller latent space. Moreover, autoencoderbased ROMs do not require any more information than standard POD. Indeed, developing a ROM using the decoder mapping means seeking which satisfies
where , is a latent representation of the state , and is a nonlinear mapping from the latent space to the ambient . In particular, if can be constructed such that its image is the solution submanifold , then the comparatively small latent representation is sufficient for complete characterization of the state . As is typically trained using only the snapshots in , when this provides the potential for an accurate nonlinear mapping using the same information necessary for POD.
Using the approximation in the FOM (1), it follows that where denotes the derivative operator (i.e. Jacobian matrix) of . Therefore, a reduced order model involving only can be generated by assuming that the approximate state obeys the same dynamics as the original state . In particular, if and has full rank at every , it follows quickly that
(3) 
where denotes the MoorePenrose pseudoinverse of . This is a well posed lowdimensional dynamical system that is feasible to solve when and its derivatives are known. Note that the ROM (3) takes place on the tangent bundle to the solution manifold , so that is a lowdimensional integral curve in this space. Moreover, it is clear that choosing the mapping immediately recovers the standard POD ROM from before since is a linear mapping.
Solving the ROM (3) is often accomplished using Newton or quasiNewton methods based on minimizing the residual (e.g. [lee2020, fu2018]), though recent work in [eivazi2020, fresca2021] has demonstrated that a purely datadriven approach can also be effective for endtoend reducedorder modeling of PDEs without appealing to conventional numerical solvers. More precisely, instead of relying on a lowdimensional system of differential equations to compute the latent representation , it is reasonable to consider computing this quantity with another neural network which is fully connected and relatively simple. In this case, the autoencoder can be trained alongside the mapping through the joint loss
(4) 
The first term in is simply the reconstruction error coming from the autoencoder, while the second term encourages the latent representation to agree with the encoded state . This approach has the advantage of requiring only function evaluations in its online stage, at the potential cost of model generalizability. On the other hand, experiments in [fresca2021] as well as Section 4 of this work show that datadriven ROMs still produce competitive results in practice while maintaining a lower online cost.
It is important to keep in mind that the existence of the decoder mapping is independent of the numerical technique used to generate it, and therefore the use of the convolutional propagation rule (2) is a design decision rather than a necessity. In fact, at present there are an enormous number of different neural network architectures available for accomplishing both predictive and generative tasks, each with their own unique properties. Therefore, it is reasonable to question whether or not deep CAEs based on (2) provide the best choice of network architecture for ROM applications. The goal of the sequel is to introduce an alternative to the conventional CAE, along with some experiments which compare the performance of this CAEROM to other autoencoderbased ROMs involving fully connected and graph convolutional neural networks.
3. A Graph Convolutional Autoencoder for ROM
The convolutional propagation rule (2) comes with inherent advantages and disadvantages that become readily apparent in a ROM setting. One very positive quality of this rule is that if the convolutional filter is of size , then its associated discrete convolution is precisely localized in space. This locality ensures that only the features of nearby nodes are convolved together, increasing the interpretability and expressiveness of the neural network. On the other hand, it is clear that (2) does not translate immediately to irregular domains, where each node has a variable number of neighbors in its hop neighborhood. Because of this, bothersome tricks are needed to apply the CAE based on (2) to PDE data defined on irregular or unstructured domains. In particular, one popular approach (c.f. [fresca2021]) is to simply pad and reshape the solution. More precisely, each feature of can be flattened into a vector, zeropadded until it has length for some , and then reshaped to a square of size . This square is then fed into the CAE, generating an output of the same shape which can be flattened and truncated to produce the required feature of . While this procedure works surprisingly well in many cases, it has the potential to create a very high overhead cost due to artificial padding of the solution. Moreover, it fails to preserve the beneficial locality property of convolution, since nodes which were originally far apart may end up close together after reshaping. Remedying these issues requires a new autoencoder architecture based on graph convolutional neural networks, which will now be discussed.
3.1. Graph Convolutional Neural Networks
While the discrete convolution in (2) is in some sense a direct translation of the continuous version, it is only meaningful for regular, structured data. However, PDE simulations based on highfidelity numerical techniques such as finite element methods frequently involve domains which are irregular and unstructured. In this case, developing an autoencoderbased ROM requires a notion of convolution which is appropriate for data with arbitrary connectivity. On the other hand, it is not immediately obvious how to define a useful notion of convolution when the domain in question is unstructured. Because discrete procedures typically obey a “no free lunch” scenario with respect to their continuous counterparts, it is an active area of research to determine which properties of the continuous convolution are most beneficial when preserved in the discrete setting. Since unstructured domains can be effectively modeled as mathematical graphs, this has motivated a large amount of activity in the area of graph convolutional neural networks (GCNNs). The next goal is to recall the basics of graph convolution and propose an autoencoder architecture which makes use of this technology.
To that end, consider an undirected graph with (weighted or unweighted) adjacency matrix . Recall that is symmetric with entries defined by
where are the weight values associated to each edge. Given this information, there are a wide variety of approaches to defining a meaningful convolution between two signals defined at the nodes of (see e.g. [wu2020]). One highly successful approach is to make use of the fact that convolution is just ordinary multiplication in Fourier space. In particular, recall the combinatorial Laplacian which can be computed as where is the diagonal degree matrix with entries . It is straightforward to check that is symmetric and positive semidefinite (see e.g. [bollobas2013]), therefore has a complete set of real eigenvalues
with associated orthonormal eigenvectors
known as the Fourier modes of . Since any signal can be considered an length array of vectors denoted, this yields the discrete Fourier transform
where is the vectorized component function of , and each entry . The advantage of this notion is a straightforward extension of convolution to graphical signals through multiplication in frequency space. In particular, it follows that forwhere denotes the elementwise Hadamard product and is the inverse Fourier transform on . Note the usual convention that matrix products occur before the Hadamard product.
This expression provides a mathematically precise (though indirect) definition of spatial graph convolution through the graph Fourier transform, but carries the significant drawback of requiring matrix multiplication with the Fourier basis which is generically nonsparse. To remedy this, Defferrard et al. [defferrard2016] proposed considering matrixvalued filters which can be recursively computed from the Laplacian . In particular, consider a convolution kernel where is an order polynomial in the eigenvalues of the normalized Laplacian and is a vector of learnable coefficient parameters. Such kernels are easily computed if the eigenvalues are known, but do not obviously eliminate the need for multiplication by the Fourier basis . On the other hand, if can be computed recursively from , then for any signal the convolution can be computed in operations, making the spectral approach more feasible for practical neural network applications. As an added benefit of the work in [defferrard2016], note that order polynomial kernels are also precisely localized in space. In particular, the component of the kernel localized at vertex is given by
where is the nodal basis vector and whenever lies outside the hop neighborhood of (c.f. [hammond2011, Lemma 5.2]). This shows that the frequencybased convolution maintains the desirable property of spatial locality when the filter is a polynomial in .
Further simplification to this idea was introduced by Kipf and Welling [kipf2016] who chose , and with a learnable parameter to obtain the firstorder expression . By renormalizing the initial graph to include selfloops (i.e. adding identity to the adjacency and degree matrices), they proposed the graph convolutional network (GCN) which obeys the propagation rule
(5) 
where is a signal on the graph with channels, is a (potentially nonsquare) weight matrix containing the learnable parameters and
Networks of this form are simple and fast to compute, but were later shown [wu2019] to be only as expressive as a fixed polynomial filter on the spectral domain of the selflooped graph with order equal to the depth of the network. While the propagation rule (5) displayed good performance on smallscale graphs, it is now known that simple GCNs are notorious for overfitting and oversmoothing the data, as their atomic operation is related to heat diffusion on the domain. This severely limits their expressive power, as GCNNs based on (5) are depthlimited in practice and cannot effectively reconstruct complex signals when used in a ROM setting.
On the other hand, many other graph convolutional operations are available at present, some of which were developed specifically to remedy this aspect of the original GCN. One such operation was introduced in [chen2020] and directly extends approach of [kipf2016]. In particular, the authors of [chen2020] define the “GCNII” or GCN2 network, whose propagation rule is
(6) 
where
are hyperparameters and
is the componentwise ReLU function
. The terms in this operation are motivated by the ideas of residual connection and identity mapping popularized in the well known ResNet architecture (c.f.
[he2016]), and the authors prove that layer networks of this form can express an arbitrary order polynomial in the selflooped graph Laplacian. While the hyperparameters are typically constant in practice, it is noted in [chen2020] that improved performance is attained when decreases with increasing network depth. Therefore, the experiments in Section 4 use the recommended values of and where is a constant hyperparameter.With this, it is now possible to use the propagation rule (6) as the basis for a convolutional autoencoder. Although spectral graph convolution does not produce any natural downsampling analogous to the CNN case, it is still reasonable to expect that a succession of convolutional layers will learn to extract important features of the data it accepts as input. Therefore, it should be possible to recover the decoder mapping necessary for ROM from this approach as well. Moreover, in contrast to the discrete convolution (2) the GCN2 operation is invariant to label permutation and remains localized on unstructured data. Therefore, it is reasonable to suspect that a CAEROM based on (6) will exhibit superior performance when the PDE to be simulated takes place on an unstructured domain. The remainder of the work will investigate whether or not this is the case, beginning with a description of the GCNN autoencoder that will be used.
Remark 3.1.
Note that there are a wide variety of graph convolutional operations which can be used as the basis for a convolutional autoencoder on unstructured data. During the course of this work, many such operations were invesigated including the ones proposed in [bresson2017, gilmer2017, qi2017, ma2019, wang2019]. Through experimentation, it was determined that the graph convolution proposed in [chen2020] is the most effective for the present purpose.
3.2. Graph Convolutional Autoencoder Architecture
Recall that the standard CAE is structured so that each layer downsamples the input in space at the same rate that it extrudes the input in the channel dimension. In particular, it is common to use convolution kernels with a stride of two and “same” zero padding (c.f. [dumoulin2016, Section 4.4]), so that the spatial dimension of decreases by a factor of four at each layer. While this occurs, the channel dimension is simultaneously increased by a factor of four, so that the total dimensionality of remains constant until the last layer(s) of the encoder. The final part of the encoder is often fully connected, taking the simplified output of the convolutional network and mapping it directly to the encoded representation in some small latent space . To recover from its encoding, this entire block of layers is simply “transposed” to form the decoder mapping , so that the autoencoder comprises the mapping as desired. The learnable parameters of the mappings are occasionally shared (resulting in a “tied autoencoder”), though most frequently they are not.
To study whether autoencoderbased ROMs benefit from the core ideas behind GCNNs, we propose the ROM architecture displayed in Figure 2. Similar to [fresca2021], this ROM is entirely datadriven and is comprised of two neural networks: a specially designed CAE based on the operation (6) whose decoder is the required mapping , and a smallscale FCNN which simulates the lowdimensional mapping . In this way, the component networks are trained directly from snapshot data during the offline stage, so that the online stage requires only function evaluations. Recall that the input is not downsampled by the convolutional procedure (6), so the feature dimension is similarly held constant. Moreover, use of the GCN2 operation ensures that each convolutional layer is skipconnected to the first, i.e. each convolutional layer retains a fraction of the preconvolutional input. Therefore, the proposed autoencoder is defended against the troublesome oversmoothing known to occur in deep GCNs. To evaluate the performance of our GCNNROM, it is compared directly to two analogous autoencoderbased datadriven ROMs which use the same prediction network
and loss function (
4) but different autoencoder architectures. The first is the standard deep CAE based on (2) and designed according to the common conventions of CAEs on regular data, and the second is a simple feedforward autoencoder which uses batchnormalized fully connected layers throughout. The results of a comparison between these ROMs are given in the next Section.4. Numerical Examples
This section details a performance comparison between ROMs using autoencoder architectures based on fully connected (FCNN), convolutional (CNN), and graph convolutional (GCNN) networks. Through examples of PDEs on structured and unstructured domains, ROM performance is evaluated on two distinct tasks. The first task measures how well each architecture can approximate PDE solutions at new parameter values (referred to as the prediction problem), and the second measures how well they can compress and decompress PDE data (referred to as the compression problem).
4.1. Data Preprocessing and Evaluation
Begin with a snapshot array each row of which is a solution at a particular parameter configuration found in the (consistently ordered) parameter matrix . We first choose integers such that and separate and rowwise into training and validation sets, taking care that no value of appears in both . After separation, the training data is normalized “columnwise” to the interval . Momentarily dropping the subscript , this means computing the transformation
where (resp. ) are arrays containing the elementwise maxima and minima of (resp. over the sampling dimension (e.g. for every relevant ), and the division is performed elementwise. Once the training data has been normalized, the validation data is normalized in a similar fashion based on the arrays computed from the training set.
Once the relevant neural networks have been trained, performance is measured through the empirical relative reconstruction of the FOM solution over the validation set. In particular, denoting the reconstructed solution by , the empirical relative reconstruction error is computed as
where . Note that the reconstructed solution for the compression problem while
for the prediction problem. In addition to the reconstruction error, it is helpful to report the memory required to store the trained models as well as the average wallclock time per training epoch. While the memory cost is a property of the network, note that the wallclock time is hardware and implementation dependent with significant fluctuation throughout the course of training. On the other hand, it provides a useful comparison (along with the plots of the loss) to estimate the difference in computational cost across autoencoder architectures. Pseudocode for the training procedure is given in Algorithm
1. Note that an early stopping criterion is employed, so that training is stopped when the validation loss does not decrease over a predefined number of consecutive epochs. All experiments are performed using Python 3.8.10 running PyTorch 1.8.1
[torch] on an early 2015 model MacBook Pro with 2.7 GHz DualCore Intel Core i5 processor and 8GB of RAM. The implementation of GCN2 convolution is modified from source code found in the PyTorch Geometric package [fey2019].layer  input size  kernel size  output size  activation  
1C  0.2  1.5  ReLU  
……  
C  0.2  1.5  ReLU  
Samples of size are flattened to size .  
1FC  Identity  
End of encoding layers. Beginning of decoding layers.  
2FC  Identity  
Samples of size are reshaped to size .  
1TC  0.2  1.5  ReLU  
……  
TC  0.2  1.5  ReLU  
End of decoding layers. Prediction layers listed below.  
3FC  50  ELU  
4FC  50  50  ELU  
……  
10FC  50  Identity 
4.2. Parameterized 1D Inviscid Burger’s Equation
The first experiment considers the parameterized onedimensional inviscid Burgers’ equation, which is a common model for fluids whose motion can develop discontinuities. In particular, let be a vector of parameters and consider the initial value problem,
(7) 
where subscripts denote partial differentiation and represents the conserved quantity of the fluid for . As a first test, it is interesting to examine how the CNN, GCNN, and FCNN autoencoderbased ROMs perform on this relatively simple hyperbolic equation. First, note that this problem is easily converted to the form (1) by discretizing the interval , in which case the solution at each nodal point is , and is a vector for each of length equal to thee number of nodes . With this, the equations (7) can be discretized and solved for various parameter values using a simple upwind forward Euler scheme. In particular, given stepsizes , let denote the solution at time step and spatial node . Then, simple forward differencing gives
which expresses the value at in terms of its value at . For the experiment shown, the discretization parameters chosen are for and for . The parameters used for training are where and , while the validation parameters are the midpoints where and . A precise description of the autoencoder architectures employed can be found in Tables 1, 2, and 3. The initial learning rates for the ADAM optimizer are chosen as 0.0025 for in the GCNN case, 0.001 in the CNN case, and 0.0005 in the FCNN case. The early stopping criterion for this example is 200 epochs and the minibatch size is 20.
As seen in Table 4, the results for both the prediction and compression problems are significantly different across architectures and latent space sizes. Indeed, the CNN autoencoder is consistently the most accurate on the prediction problem while also being the most time consuming and memory consumptive, and its performance does not improve with increased latent dimension. On the other hand, the performance of the GCNNROM increases quite a bit when the latent dimension is increased from 3 to 10, but remains significantly less accurate than the ROM based on CNN or FCNN. Interestingly, on the prediction problem the performance of all networks appears to be bottlenecked by the fully connected network which simulates the lowdimensional dynamics, as their accuracy does not increase when the latent dimension increases from to . This suggests that a ROM based on Newton or quasiNewton methods may produce better results in this case.
Conversely, the compression data shows that all network architectures benefit from an increased latent space dimension when the goal is to encode and decode, with the GCNN autoencoder benefiting the most and the CNN autoencoder benefiting the least. Again, the GCNN architecture struggles when the latent space dimension is set to the theoretical minimum , however its performance exceeds CNN and becomes competitive with that of FCNN once the latent dimension is increased to 32, at half of the memory cost.
Remark 4.1.
Instead of increasing the latent dimension , it is reasonable to consider increasing the depth of the autoencoder and prediction networks. However, we find that increasing the depth of the GCNN autoencoder beyond does not significantly improve its performance on either the prediction or compression problems. On the other hand, the GCNN ROM benefits modestly from an increase in depth of the prediction network from 4 to 8 layers, although the CNN and FCNN based ROMs do not. Therefore, all prediction experiments use 8 fully connected layers in the GCNN case and 4 in the CNN and FCNN cases.
Figure 3 illustrates the results at various values of for the parameter . Clearly, the architecture based on CNN is best equipped for solving the prediction problem in this case. On the other hand, when it is evident that all architectures can compress and reconstruct solution data well, although the FCNN and GCNN yield the best results. Figure 4 shows the corresponding plots of the losses incurred during network training. Note that all architectures have comparable training and validation losses for the compression problem, while all but the GCNN overfit the data slightly on the prediction problem.
layer  input size  output size  activation 
Samples of size (256,1) are flattened to size (256).  
1FC  256  64  ELU 
1BN  64  64  Identity 
2FC  64  ELU  
End of encoding layers. Beginning of decoding layers.  
3FC  64  ELU  
2BN  64  64  Identity 
4FC  64  256  ELU 
Samples of size (256) are reshaped to size (256,1)  
End of decoding layers. Prediction layers listed below.  
5FC  50  ELU  
6FC  50  50  ELU 
7FC  50  50  ELU 
8FC  50  ELU 
layer  input size  kernel size  stride  padding  output size  activation 
Samples of size (256,1) are reshaped to size (1, 16, 16).  
1C  (1, 16, 16)  5x5  1  SAME  (8, 16, 16)  ELU 
2C  (8, 16, 16)  5x5  2  SAME  (16, 8, 8)  ELU 
3C  (16, 8, 8)  5x5  2  SAME  (32, 4, 4)  ELU 
4C  (32, 4, 4)  5x5  2  SAME  (64, 2, 2)  ELU 
Samples of size (64, 2, 2) are flattened to size (256).  
1FC  256  ELU  
End of encoding layers. Beginning of decoding layers.  
2FC  256  ELU  
Samples of size (256) are reshaped to size (64, 2, 2).  
1TC  (64, 2, 2)  5x5  2  SAME  (64, 4, 4)  ELU 
2TC  (64, 4, 4)  5x5  2  SAME  (32, 8, 8)  ELU 
3TC  (32, 8, 8)  5x5  2  SAME  (16, 16, 16)  ELU 
4TC  (16, 16, 16)  5x5  1  SAME  (1, 16, 16)  ELU 
Samples of size (1, 16, 16) are reshaped to size (256, 1). 
Encoder/Decoder Prediction  Encoder/Decoder only  
Network  %  %  Size (kB) 

%  %  Size (kB) 


GCNN  3  4.41  8.49  79.0  0.55  3  2.54  5.31  13.1  0.46  
CNN  0.304  0.605  965  2.2  0.290  0.563  953  2.2  
FCNN  1.62  3.29  167  0.28  0.658  1.66  144  0.22  
GCNN  10  2.08  3.73  94.8  0.58  10  0.706  1.99  27.4  0.47  
CNN  0.301  0.630  992  2.4  0.215  0.409  967  2.2  
FCNN  0.449  1.15  172  0.27  0.171  0.361  148  0.23  
GCNN  32  2.59  4.17  145  0.59  32  0.087  0.278  72.6  0.46  
CNN  0.350  0.675  1040  2.3  0.216  0.384  1010  2.2  
FCNN  0.530  1.303  188  0.27  0.098  0.216  159  0.23 
4.3. Parameterized 2D Heat Equation
The next experiment involves the parameterized twodimensional heat equation on a uniform grid. In particular, let , , and consider the problem
(8) 
where is the temperature profile. Multiplying the first equation by a test function and integrating by parts yields the standard weak equation
which can be discretized and solved using standard finite element techniques. In particular, choosing a finite element basis so that the discretized solution has coefficient vector yields the linear system
where are the symmetric mass and stiffness matrices
(9) 
For the present experiment, is semidiscretized using bilinear elements over a regular grid to yield of dimension , and the IFISS software package [elman2007] is used to assemble and solve the boundary value problem (8) using a backward Euler method with stepsize . The system is integrated for a time period of 4 seconds at various parameters . Since no classical solution to (8) exists at , solution snapshots are stored starting from . Some solutions along this heat flow contained in the validation set are displayed in Figure 6.
The parameters appearing in the boundary conditions are again chosen to lie in a lattice. In particular, the 9 values used for training are for and the 4 validation values are the lattice midpoints . A sample from the validation set along with its ROM reconstructions for the case is shown in Figure 6. The precise architectures used are displayed in Tables 1, 5, and 6. The initial learning rates are for the GCNN case, for the CNN case, and for the FCNN case. The early stopping criterion for network training is 100 epochs, and the minibatch size is .
layer  input size  output size  activation 
Samples of size (2145,1) are flattened to size (2145).  
1FC  2145  215  ELU 
1BN  215  215  Identity 
2FC  215  ELU  
End of encoding layers. Beginning of decoding layers.  
3FC  215  ELU  
2BN  215  215  Identity 
4FC  215  2145  ELU 
Samples of size (2145) are reshaped to size (2145,1)  
End of decoding layers. Prediction layers listed below.  
5FC  50  ELU  
6FC  50  50  ELU 
7FC  50  50  ELU 
8FC  50  ELU 
layer  input size  kernel size  stride  padding  output size  activation 
Samples of size (2145, 1) are zero padded to size (4096,1) and reshaped to size (1, 64, 64).  
1C  (1, 64, 64)  5x5  2  SAME  (4, 32, 32)  ELU 
2C  (4, 32, 32)  5x5  2  SAME  (16, 16, 16)  ELU 
3C  (16, 16, 16)  5x5  2  SAME  (64, 8, 8)  ELU 
4C  (64, 8, 8)  5x5  2  SAME  (256, 4, 4)  ELU 
Samples of size (256, 4, 4) are flattened to size (4096).  
1FC  4096  ELU  
End of encoding layers. Beginning of decoding layers.  
2FC  4096  ELU  
Samples of size (4096) are reshaped to size (256, 4, 4).  
1TC  (256, 4, 4)  5x5  2  SAME  (64, 8, 8)  ELU 
2TC  (64, 8, 8)  5x5  2  SAME  (16, 16, 16)  ELU 
3TC  (16, 16, 16)  5x5  2  SAME  (4, 32, 32)  ELU 
4TC  (4, 32, 32)  5x5  2  SAME  (1, 64, 64)  ELU 
Samples of size (1, 64, 64) are reshaped to size (4096, 1) and truncated to size (2145, 1). 
Encoder/Decoder Prediction  Encoder/Decoder only  
Network  %  %  Size (MB) 

%  %  Size (MB) 


GCNN  3  7.19  9.21  0.132  9.5  3  6.96  9.21  0.0659  9.2  
CNN  3.26  4.58  3.64  18  3.36  3.81  3.62  18  
FCNN  4.75  6.19  3.74  3.3  4.22  5.69  3.72  3.1  
GCNN  10  2.87  3.82  0.253  9.6  10  2.06  2.85  0.186  9.4  
CNN  3.07  4.38  3.87  18  2.45  2.90  3.85  18  
FCNN  2.96  3.97  3.76  3.3  2.32  2.92  3.73  3.1  
GCNN  32  2.55  3.48  0.636  9.6  32  1.05  1.91  0.564  9.2  
CNN  2.30  3.73  4.60  19  2.34  2.91  4.57  18  
FCNN  2.65  4.25  3.80  3.2  1.61  2.31  3.77  3.2 
The results for this experiment are displayed in Table 7, which clearly illustrates the drawbacks of using the standard CNN architecture along with solution reshaping. Not only is the computation time increased due to the introduction of almost 2000 fake nodes, but the accuracy of the CNN ROM is significantly lower than the FCNNROM despite its comparable or larger memory cost. On the other hand, the architecture based on GCNN is both the least memory consumptive and the most accurate on each problem when . Again, results show that the GCNNROM is most effective on the compression problem, demonstrating significantly improved accuracy when the latent dimension is not too small. Conversely, if a latent dimension equal to the theoretical minimum is desired, then the ROM based on deep CNN is still superior, potentially due to its multiple levels of filtration with relatively large kernels. An example ROM reconstruction along with pointwise error plots is displayed in Figure 7. Interestingly, Figure 5 shows that all autoencoder networks exhibit significant overfitting on this dataset, suggesting that none of the models has sufficient generalizability to produce optimal results on new data.
4.4. Unsteady 2D NavierStokes Equations
The final experiment considers the unsteady NavierStokes equations on a rectangular domain with a slightly offcenter cylindrical hole, which is a common benchmark model for computational fluid dynamics (c.f [schafer1996]). The domain is pictured in Figure 8, and the governing system is
subject to the boundary conditions
where are the velocity and pressure functions and is the viscosity parameter. Recall the Reynolds number
where is the average speed along the flow and is the characteristic length of the flow configuration. The goal of this experiment is to see how well the ROMautoencoder architectures can compress and predict solutions to these equations for a given parameter pair .
To generate the training and validation data in this case, three values of are selected corresponding to for . This yields solutions with corresponding . The flow is simulated starting from rest over the time interval , using Netgen/NGSolve with TaylorHood finite elements for spatial discretization and implicitexplicit Euler with for time stepping. Solution snapshots are stored once every time steps, yielding a total of snapshots per example. Note that domain is represented by an unstructured irregular mesh whose element density is higher near the cylindrical hole, yielding two solution features (the components of velocity) with dimension equal to the number of nodes. For the present experiments, the data corresponding to are used to train the ROM networks and the data correseponding to are used for validation. The architectures used in this case are given in Tables 1, 8, and 9, and the early stopping criterion is 50 epochs. The initial learning rate used to train each network is .
layer  input size  output size  activation 
Samples of size (10104, 2) are flattened to size (20208).  
1FC  20208  2021  ELU 
1BN  2021  2021  Identity 
2FC  2021  202  ELU 
2BN  202  202  Identity 
3FC  202  ELU  
End of encoding layers. Beginning of decoding layers.  
4FC  202  ELU  
3BN  202  202  Identity 
5FC  202  2021  ELU 
4BN  2021  2021  Identity 
6FC  2021  20208  ELU 
Samples of size (20208) are reshaped to size (10104, 2)  
End of decoding layers. Prediction layers listed below.  
7FC  50  ELU  
8FC  50  50  ELU 
9FC  50  50  ELU 
10FC  50  ELU 
layer  input size  kernel size  stride  padding  output size  activation 
Samples of size (10104, 2) are zero padded to size (16384,2) and reshaped to size (2, 128, 128).  
1C  (2, 128, 128)  5x5  2  SAME  (8, 64, 64)  ELU 
2C  (8, 64, 64)  5x5  2  SAME  (32, 32, 32)  ELU 
3C  (32, 32, 32)  5x5  2  SAME  (128, 16, 16)  ELU 
4C  (128, 16, 16)  5x5  2  SAME  (512, 8, 8)  ELU 
5C  (512, 8, 8)  5x5  2  SAME  (2048, 4, 4)  ELU 
Samples of size (2048, 4, 4) are flattened to size (2*16384).  
1FC  32768  ELU  
End of encoding layers. Beginning of decoding layers.  
2FC  32768  ELU  
Samples of size (2*16384) are reshaped to size (2048, 4, 4).  
1TC  (2048, 4, 4)  5x5  2  SAME  (512, 8, 8)  ELU 
2TC  (512, 8, 8)  5x5  2  SAME  (128, 16, 16)  ELU 
3TC  (128, 16, 16)  5x5  2  SAME  (32, 32, 32)  ELU 
4TC  (32, 32, 32)  5x5  2  SAME  (8, 64, 64)  ELU 
5TC  (8, 64, 64)  5x5  2  SAME  (2, 128, 128)  ELU 
Samples of size (2, 128, 128) are reshaped to size (16384, 2) and truncated to size (10104, 2). 
Encoder/Decoder Prediction  Encoder/Decoder only  
Network  %  %  Size (MB) 

%  %  Size (MB) 


GCN  2  9.07  14.6  0.476  33  2  7.67  12.2  0.410  32  
CNN  7.12  11.1  224  210  11.2  17.6  224  190  
FCNN  1.62  2.87  330  38  1.62  2.70  330  38  
GCN  32  2.97  5.14  5.33  32  32  0.825  1.49  5.26  32  
CNN  4.57  7.09  232  230  4.61  7.24  232  220  
FCNN  1.39  2.64  330  38  0.680  1.12  330  38  
GCN  64  2.88  4.96  10.5  33  64  0.450  0.791  10.4  33  
CNN  3.42  5.33  241  270  2.42  3.57  241  260  
FCNN  1.45  2.64  330  38  0.704  1.19  330  37 
The results of this experiment are given in Table 10, making it clear that the trick of reshaping the nodal features in order to apply CNN carries an associated accuracy cost. Indeed, the performance of the CNNROM on the prediction problem is inferior to both the GCNN and FCNNROMs (provided the latent space is not too small) despite its much higher computational cost. Moreover, on this highly irregular domain the CNNbased architecture does not even offer benefits in the lowdimensional case, where the architecture based on FCNN is roughly 5 times faster while maintaining superior accuracy. As before, the GCNNROM struggles when is small but gains in performance as the latent dimension increases. In fact, results indicate that the GCNN is superior to the FCNN on the compression problem when while also being faster to train and an order of magnitude less memory consumptive. Although is far from the “best case scenario” of , it is still much smaller than the original dimension of and may be useful in a realtime or manyquery setting. Interestingly, Figure 9 shows that the CNNROM does not overfit on the prediction or compression problems, indicating that the standard CNN may not be expressive enough to handle complicated irregular data that has been aggressively reshaped. Similarly, the GCNNROM does not overfit during the compression examples, although its performance scales with the dimension of the latent space. Figures 10, 11, and 12 provide a visual comparison of the ROM reconstructions and their pointwise error at . Finally, note that for this problem it may be infeasible in the case to generate an accurate prediction ROM using any method, since it is still an open question whether the mapping is unique in this case.
5. Conclusion
Neural networkbased strategies for the reducedorder modeling of PDEs have been discussed, and a novel autoencoder architecture based on graph convolution has been proposed for this purpose. Through interesting benchmark problems, the performance of three fundamentally different autoencoderbased ROM architectures has been compared, demonstrating that each architecture has advantages and disadvantages depending on the task involved and the connectivity pattern of the input data. In general, results indicate that the proposed graph convolutional autoencoder is superior for compression and decompression tasks provided the dimension of the latent space is sufficiently large. Moreover, it seems that the standard CAE is highly performing at low values of but should be avoided when the model domain is irregular. Finally, the fully connected autoencoder exhibits strong overall performance at a much higher memory cost than the alternatives. Future work will examine how the proposed GCNN autoencoder can be combined with traditional PDEbased numerical solvers to generate solutions at unknown parameter configurations without the need for a fully connected prediction network, potentially eliminating the noted accuracy bottleneck arising from this part of the ROM.
Acknowledgments
This work is partially supported by U.S. Department of Energy Scientific Discovery through Advanced Computing under grants DESC0020270 and DESC0020418.
Comments
There are no comments yet.