The solution of a parametrized system of partial differential equations (PDEs) by means of afull-order model (FOM), whenever dealing with real-time or multi-query scenarios, entails prohibitive computational costs if the FOM is high-dimensional. In the former case, the FOM solution must be computed in a very limited amount of time; in the latter one, the FOM must be solved for a huge number of parameter instances sampled from the parameter space. Reduced order modeling techniques aim at replacing the FOM by a reduced order model (ROM), featuring a much lower dimension, still able to express the physical features of the problem described by the FOM. The basic assumption underlying the construction of such a ROM is that the solution of a parametrized PDE, belonging a priori to a high-dimensional (discrete) space, lies on a low-dimensional manifold embedded in this space. The goal of a ROM is then to approximate the solution manifold – that is, the set of all PDE solutions when the parameters vary in the parameter space – through a suitable, approximated trial manifold.
A widespread family of reduced order modeling techniques relies on the assumption that the reduced-order approximation can be expressed by a linear combination of basis functions, built starting from a set of FOM solutions, called snapshots. Among these techniques, proper orthogonal decomposition (POD) – equivalent to principal component analysis in statisticshastie2001theelements
, or Karhunen-Loève expansion in stochastic applications – exploits the singular value decomposition of a suitable snapshot matrix (or the eigen-decomposition of the corresponding snapshot correlation matrix), thus yieldinglinear ROMs, in which the ROM approximation is given by the linear superimposition of POD modes. In this case, the solution manifold is approximated through a linear trial manifold, that is, the ROM approximation is sought in a low-dimensional linear trial subspace.
Projection-based methods are linear ROMs in which the ROM approximation of the PDE solution, for any new parameter value, results from the solution of a low-dimensional (nonlinear, dynamical) system, whose unknowns are the ROM degrees of freedom (or generalized coordinates). Despite the PDE (and thus the FOM) being linear or not, the operators appearing in the ROM are obtained by imposing that the projection of the FOM residual evaluated on the ROM trial solution is orthogonal to a low-dimensional, linear test subspace, which might coincide with the trial subspace. Hence, no matter whether the PDE is linear or not, the resulting ROM islinear since the reduced dynamics is obtained through a projection onto a linear subspace benner2017model; benner2015asurvey; quarteroni2016reduced. However, linear ROMs show severe computational bottlenecks when dealing with problems featuring coherent structures (possibly dependent on parameters) that propagate over time, namely in transport and wave-type phenomena, or convection-dominated flows. In these cases, the dimension of the linear trial manifold can easily become extremely large if compared to the intrinsic dimension of the solution manifold for the sake of accuracy, thus compromising the ROM efficiency. To overcome this bottleneck, ad-hoc extensions of the POD strategy have been considered, towards nonlinear approaches to build a ROM ohlberger2016reduced; pagani2018numerical.
In this paper we propose a computational, non-intrusive approach based on deep learning (DL) algorithms to deal with the construction of efficient ROMs (which we refer to as DL-ROMs) in order to tackle parameter-dependent PDEs; in particular, we consider PDEs that feature wave-type phenomena. A comprehensive framework is presented for the global approximation of the map , where denotes time,
a vector of input parameters andthe solution of a large-scale dynamical system arising from the space discretization of a parametrized, time-dependent (non)linear PDE. Several recent works have shown possible applications of DL techniques to parametrized PDEs – thanks to their approximation capabilities, their extremely favorable computational performances during online testing phases, and their relative easiness of implementation – both from a theoretical kutyniok2019atheoretical
and a computational standpoint. Regarding this latter aspect, artificial neural networks (ANN), such as feedforward neural networks, have been employed to model the reduced dynamics in a data-drivenregazzoni2019machinelearningfor, and less intrusive way (avoiding, e.g., the costs entailed by projection-based ROMs), but still relying on a linear trial manifold built, e.g., through POD. For instance, in guo2018reduced; guo2019data; hestaven2018non-intrusive; san2018neural the solution of a (nonlinear, time-dependent) ROM for any new parameter value has been replaced by the evaluation of ANN-based regression models; similar ideas can be found, e.g., in kani2017dr-rnn; mohan2018adeep; wan2018data. Few attempts have been made in order to describe the reduced trial manifold where the approximation is sought (avoiding, e.g., the linear superimposition of POD modes) through ANNs, see, e.g., gonzalez2018deep; carlberg2018model.
For instance, a projection-based ROM technique has been introduced in (carlberg2018model)
, in which the FOM system is projected onto a nonlinear trial manifold identified by means of the decoder function of a convolutional autoencoder neural network. However, the ROM is derived by minimizing a residual formulation, for which the quasi-Newton method herein employed requires the computation of an approximated Jacobian of the residual at each time step. A ROM technique based on a deep convolutional recurrent autoencoder has been proposed ingonzalez2018deep
, where a reduced trial manifold is obtained by means of a convolutional autoencoder; the latter is then used to train a Long Short-Term Memory (LSTM) neural network modeling the reduced dynamics. However, no explicit parameter dependence in the PDE problem is considered, apart from-dependent initial data, and the LSTM is trained on reduced approximations obtained through the encoder function of the autoencoder. Another promising application of machine and deep learning techniques within a ROM framework deals with the efficient evaluation of reduced error models, see, e.g., freno2018machine; paganicarlbergmanzoni2019; parish2019time; trehan2017error.
Our goal is to set up nonlinear ROMs whose dimension is nearly equal (if not equal) to the intrinsic dimension of the solution manifold that we aim at approximating. Our DL-ROM approach combines and improves the techniques introduced in gonzalez2018deep; carlberg2018model by shaping an all-inclusive DL-based ROM technique, where we both (i) construct the reduced trial manifold and (ii)
model the reduced dynamics on it employing ANNs. The former task is achieved by using the decoder function of a convolutional autoencoder; the latter task is instead carried out by considering a feedforward neural network and the encoder function of a convolutional autoencoder. Moreover, we set up a computational procedure performing the training of both network architectures simultaneously, by minimizing a loss function that weights two terms, one dedicated to each single task. In this respect, we are able to design a flexible framework capable to handle parameters affecting both PDE operators and data, which avoids both the expensive projection stage ofcarlberg2018model and the training of a more expensive LSTM network. In our technique, the intrusive construction of a ROM is replaced by the evaluation of the ROM generalized coordinates through a deep feedforward neural network taking only as inputs. The proposed technique is purely data-driven, that is, it only relies on the computation of a set of FOM snapshots – in this respect, DL does not replace the high-fidelity FOM as, e.g., in the works by Karniadakis and coauthors raissi2018hidden; raissi2017physics1; raissi2017physics2; raissi2019physics; raissi2018deep; rather, DL techniques are built upon it, to enhance the repeated evaluation of the FOM for different values of the parameters.
The structure of the paper is as follows. In section 2 we show how to generate nonlinear ROMs by reinterpreting the classical ideas behind linear ROMs for parametrized PDEs. In section 3 we detail the construction of the proposed DL-ROM, whose accuracy is numerically assessed in section 4 by considering three different test cases of increasing complexity (both with respect to the parametric dependence and the nature of the PDE). Finally, some conclusions are drawn in section 5. A quick overview of useful facts about deep feedforward, convolutional and autoencoders neural networks is reported in A to make the paper self-contained.
2 From linear to nonlinear dimensionality reduction
Starting from the well-known setting of linear (projection-based) ROMs, in this section we generalize this task to the case of nonlinear ROMs.
2.1 Problem formulation
We formulate the construction of ROMs in algebraic terms, starting from the high-fidelity (spatial) approximation of nonlinear, time-dependent, parametrized PDEs. By introducing suitable space discretizations techniques (such as, e.g., the Finite Element Method, Isogeometric Analysis or the Spectral Element Method) the high-fidelity, full order model (FOM) can be expressed as a nonlinear parametrized dynamical system. Given , we aim at solving the initial value problem
where the parameter space is a bounded and closed set, is the parametrized solution of (1), is the initial datum and is a (nonlinear) function, encoding the system dynamics. The FOM dimension is related with the finite dimensional subspaces introduced for the space discretization of the PDE – here usually denotes a discretization parameter, such as the maximum diameter of elements in a computational mesh – and can be extremely small whenever the PDE problem shows complex physical behaviors and/or high degrees of accuracy are required to its solution. The parameter may represent physical or geometrical properties of the system, like, e.g., material properties, initial and boundary conditions, or the shape of the domain. In order to solve problem (1), suitable time discretizations are employed, such as backward differentiation formulas quarteroni2008matematica.
Our goal is the efficient numerical approximation of the whole set
Assuming that, for any given parameter , problem (1) admits a unique solution, for each , the intrinsic dimension of the solution manifold is at most , where is the number of parameters (time plays the role of an additional coordinate). This means that each point belonging to is completely defined in terms of at most intrinsic coordinates, or equivalently, the tangent space to the manifold at any given is spanned by basis vectors.
2.2 Linear dimensionality reduction: projection-based ROMs
The most common way to build a ROM for the efficient approximation of problem (1) relies on the introduction of a reduced linear trial manifold, that is of a subspace of dimension , spanned by the columns of a matrix . Hence, a linear ROM looks for an approximation in the form
where . Here for each , denotes the vector of intrinsic coordinates (or degrees of freedom) of the ROM approximation; note that the map
that, given the (low-dimensional) intrinsic coordinates, returns the (high-dimensional) approximation of the FOM solution , is linear.
Proper Orthogonal Decomposition (POD) is one of the most widely employed techniques to generate the linear trial manifold quarteroni2016reduced. Considering a set of instances of the parameter , we introduce the snapshot matrix defined as
where we have introduced a partition of the time interval in time steps , , of size and . Moreover, let us introduce a symmetric and positive definite matrix encoding a suitable norm (e.g., the energy norm) on the high-dimensional space and admitting a Cholesky factorization . POD computes the Singular Value Decomposition (SVD) of ,
where , and with , and , and sets the columns of in terms of the first left singular vectors of that is, . By construction, the columns of are orthonormal (with respect to the scalar product ) and among all possible -dimensional subspaces spanned by the column of a matrix , provides the best reconstruction of the snapshots, that is,
where . For this reason, we refer to as to the optimal-POD reconstruction of onto a reduced subspace of dimension .
and impose that the residual
associated to the first equation of (6) is orthogonal to a -dimensional subspace spanned by the column of a matrix , that is, . This condition yields the following ROM
In the case a Galerkin projection is performed, while the case yields a more general Petrov-Galerkin projection. Note that choosing such that does not automatically ensure ROM stability on long time intervals.
The RB method under the form of either Galerkin-POD or Petrov-Galerkin-POD methods has been successfully applied to a broad range of parametrized time-dependent (non)linear problems (see, e.g., (pagani2018numerical; manzoni2016accurate)) however it provides low-dimensional subspaces of dimension much larger than the intrinsic dimension of the solution manifold – relying on a linear, global trial manifold thus represent a major bottleneck to computational efficiency (ohlberger2016reduced; pagani2018numerical)
. This is the case, for instance, of hyperbolic problems, for which the RB method is not able in practice to significantly decrease the dimensionality of the problem. The same difficulty might also affect the use of hyper-reduction techniques, such as the (discrete) empirical interpolationbarrault2004anempirical; chaturantabut2010nonlinear, mandatory in order to assemble the operators appearing in the ROM (8) without relying on expensive -dimensional arrays. See, e.g., FGMQ_19 for further details.
2.3 Nonlinear dimensionality reduction
A first attempt to overcome the computational bottleneck entailed by the use of a linear, global trial manifold is to build a piecewise linear trial manifold, using local reduced bases whose dimension is smaller than the one of the global linear trial manifold. Clustering algorithms applied on a set of snapshots can be employed to partition them into clusters from which POD can extract a subspace of reduced dimension; the ROM is then obtained by following the strategy described above on each cluster separately, see, e.g. amsallem2012nonlinear; Amsallem2015. An alternative approach based on classification binary trees has been introduced in Amsallem2016. These strategies have been employed (and compared) in pagani2018numerical in order to solve parametrized problems in cardiac electrophysiology. Using a piecewise linear trial manifold partially overcomes the limitation of a linear dimensionality reduction technique as POD, yet employing local bases of dimension much higher than the intrinsic dimension of the solution manifold . An approach based on a dictionary of solutions, computed offline, has been developed in abgrall_amsallem as an alternative to using a truncated reduced basis based on POD, together with an online -norm minimization of the residual.
Other possible options involving nonlinear transformations of modes might rely on a reconstruction of the POD modes at each time step using Lax pairs GerbeauLombardi2014, on the solution of Monge-Kantorovich optimal transport problems PhysRevE.89.022923, on a problem-dependent change of coordinates requiring the solution of an optimization problem repeatedly cagniart2019model, on shifted POD modes reiss2018shifted after multiple transport velocities have been identified and separated, or again basis updates are derived from querying the full model at a few selected spatial coordinates peherstorfer2018. Despite providing remarkable improvements compared to the classic (Petrov-)Galerkin-POD approach, all these strategies exhibit some drawbacks, such as: (i) the high computational costs entailed during the online testing evaluation stage of the ROM – which is not restricted to the intensive offline training stage; (ii) performances and settings are highly dependent on the problem at hand; (iii) the need to deal only with a linear superimposition of modes (which characterizes linear ROMs), yielding low-dimensional spaces whose dimension is still (much) higher than the intrinsic dimension of the solution manifold.
Motivated by the need of avoiding the drawbacks of linear ROMs and setting a general paradigm for the construction of efficient, extremely low-dimensional ROMs, we resort to nonlinear dimensionality reduction techniques. Similarly to gonzalez2018deep; carlberg2018model, we build a nonlinear ROM to approximate by
where , , , is a nonlinear, differentiable function. As a matter of fact, the solution manifold is approximated by a reduced nonlinear trial manifold
so that . As before, denotes the vector-valued function of two arguments representing the intrinsic coordinates of the ROM approximation. Our goal is to set a ROM whose dimension is as close as possible to the intrinsic dimension of the solution manifold , i.e. , in order to correctly capture the solution of the dynamical system by containing the size of the approximation spaces (carlberg2018model).
To model the relationship between each couple , and to describe the system dynamics on the reduced nonlinear trial manifold in terms of the intrinsic coordinates, we consider a nonlinear map under the form
where is a differentiable nonlinear function. No additional assumptions such as, e.g., the (exact, or approximate) affine -dependence as in the RB method, are needed.
3 A deep learning-based reduced order model (DL-ROM)
We now detail the construction of the proposed nonlinear ROM. In this respect, we define the functions and in (9) and (11) by means of deep learning (DL) algorithms, exploiting neural network architectures. This choice is motivated by their ability of effectively approximating nonlinear maps, and by their ability to learn from data and generalize to unseen data. On the other hand, DL models enable us to build non-intrusive, completely data-driven, ROMs, since their construction only requires to access the dataset, the parameter values and the snapshot matrix, but not the FOM arrays appearing in (1).
The DL-ROM technique that we develop in this paper is composed by two main blocks responsible, respectively, for the reduced dynamics learning and the reduced trial manifold learning (see Figure 2). Hereon, we denote by , and the number of training-parameter instances, of testing-parameter instances and time instances, respectively, and we set . The dimension of both the FOM solution and the ROM approximation is , while denotes the number of intrinsic coordinates, with .
For the description of the system dynamics on the reduced nonlinear trial manifold (which we refer to as reduced dynamics learning), we employ a deep feedforward neural network (DFNN) with layers, that is, we define the function in definition (11) as
thus yielding the map
where takes the form (30), withtimes. Here and denotes the vector of parameters of the DFNN.
Regarding instead the description of the reduced nonlinear trial manifold defined in (10) (which we refer to as reduced trial manifold learning), we employ the decoder function of a convolutional autoencoder (AE), that is, we define the function appearing in (9) and (10) as
thus yielding the map
where results from the composition of several layers, some of which of convolutional type, overall depending on the vector of parameters of the decoder function.
Combining the two former stages, the DL-ROM approximation is then given by
Computing the ROM approximation (14) for any new value of , at any given time, requires to evaluate the map at the testing stage, once the parameters have been determined, once and for all, during the training stage. The training stage consists in solving an optimization problem (in the variable ) after a set of snapshots of the FOM have been computed. More precisely, provided the parameter matrix defined as
and the snapshot matrix , defined in (4), we solve the problem: find the optimal parameters solution of
which is a stochastic gradient descent method(robbins1951astochastic) computing an adaptive approximation of the first and second momentum of the gradients of the loss function. In particular, it computes exponentially weighted moving averages of the gradients and of the squared gradients. We set the starting learning rate to , the batch size to
and the maximum number of epochs to. We perform cross-validation, in order to tune the hyper-parameters of the DL-ROM, by splitting the data in training and validation and following a proportion 8:2. Moreover, we implement an early-stopping regularization technique to reduce overfitting (goodfellow2016deep). In particular, we stop the training if the loss does not decrease over 500 epochs. As nonlinear activation function we employ the ELU function (clevert2015fast) defined as
No activation function is applied at the last convolutional layer of the decoder neural network, as usually done when dealing with autoencoders. The parameters, weights and biases, are initialized through the He uniform initialization (he2015delving).
As we rely on a convolutional autoencoder to define the function , we also exploit the encoder function
which maps each FOM solution associated to the pairs
provided as inputs to the feed-forward neural network (12), onto a low-dimensional representation depending on the parameters vector defining the encoder function.
Indeed, the actual architecture of DL-ROM that is used only during the training and the validation phases, but not during testing, is the one shown in Figure 3.
In practice, we add to the architecture of the DL-ROM introduced above the encoder function of the convolutional autoencoder. This produces an additional term in the per-example loss function (17), thus calling the following optimization problem to be solved:
and , with . The per-example loss function (20) combines the reconstruction error (that is, the error between the FOM solution and the DL-ROM approximation) and the error between the intrinsic coordinates and the output of the encoder. This further term allows to enhance the performance of the DL-ROM, as shown in Test 3 of section 4.
3.1 Training and Testing Algorithms
Let us now detail the algorithms through which the training and testing phases of the networks are performed.
First of all, data normalization and standardization enhance the training phase of the network by rescaling all the values contained in the dataset to a common frame. For this reason, the inputs and the output of DL-ROM are normalized by applying an affine transformation in order to rescale them in the range . In particular, provided a training dataset , we define
so that data are normalized by applying the following transformation
Transformation (22) is applied also to the validation and testing sets, but considering as and the values computed over the training set. We point out that the input of the encoder function, the FOM solution for a given (time, parameter) instance , is reshaped in a matrix. In particular, starting from we apply the transformation =reshape where . If is not a square, the input
is zero-padded(goodfellow2016deep). For the sake of simplicity, we continue to refer to the reshaped FOM solution to as . The inverse reshaping transformation is applied to the output of the last convolutional layer in the decoder function, the ROM approximation. Moreover, we highlight that applying one of the functions (12)-(13)-(18) to the matrix X means applying it row-wise.
The training algorithm referring to the architecture of DL-ROM depicted in Figure 3 is reported in Algorithm 1. During the training phase, the optimal parameters of the DL-ROM neural network are found by solving the optimization problem (19)-(20) through the back-propagation and ADAM algorithms.
4 Numerical results
In this section, we report the numerical results obtained by applying the proposed DL-ROM technique to three parametrized, time-dependent PDE problems, namely (i) Burgers equation, (ii) a linear transport equation, and (iii) a coupled PDE-ODE system arising from cardiac electrophysiology, namely the monodomain equation; this latter is a system of time dependent, nonlinear equations, whose solutions feature a traveling wave behavior. For the time being, we deal with problems set in (spatial) dimension featuring up to parameters; we will consider the extension to differential problems in and in a forthcoming publication. For this reason, our focus is now on the numerical accuracy of our DL-ROM technique rather than on its computational efficiency and, therefore, on its comparison with linear ROMs such as the RB method featuring linear (possibly, piecewise linear) trial manifolds built through POD.
To evaluate the performance of DL-ROM we rely on the loss function (20) and on the following error indicator
We implement the neural network required by our DL-ROM technique by means of the Tensorflow deep learning framework(abadi2016tensorflow) and the numerical simulations are performed on a workstation equipped with an Nvidia GeForce GTX 1070 8 GB GPU.
4.1 Test 1: Burgers Equation
Let us consider the parametrized one-dimensional nonlinear Burgers equation
with , and . System (24) has been discretized in space by means of linear finite elements, with grid points, and in time by means of the Backward Euler scheme, with time instances. The parameter space, to which belongs the single () parameter, is given by . We consider
training-parameter instances uniformly distributed overand testing-parameter instances, each of them corresponding to the midpoint between two consecutive training-parameter instances.
The configuration of the DL-ROM neural network used for this test case is the following. We choose a 12-layers DFNN equipped with 50 neurons per hidden layer andneurons in the output layer, where corresponds to the dimension of the reduced trial manifold. The architectures of the encoder and decoder functions are instead reported in Tables 1 and 2, and are similar to the ones used in (carlberg2018model).
|Layer||Input Dimension||Output Dimension||Kernel Size||of Filters||Stride||Padding|
|Layer||Input dimension||Output dimension||Kernel size||of filters||Stride||Padding|
Problem (24) does not represent a remarkably challenging task for linear ROM, indeed by considering for example POD and by applying it to the snapshot matrix (the latter built by collecting the solution of (24) for training-parameter instances) it is sufficient to assemble a linear trial manifold of dimension 20 in order to capture more than the 99.99 of the energy of the system (san2018neural; quarteroni2016reduced). In order to assess the performance of our DL-ROM technique, we compute the DL-ROM solution by fixing the dimension of the nonlinear trial manifold to . In Figure 4 we show the DL-ROM and the optimal-POD reconstructions, along with the FOM solution, for the time instance and for the testing-parameter instance , the testing value of for which the reconstruction task results to be the most difficult both for POD and DL-ROM, being the diffusion term in (24) smaller and the solution closer to the one of a purely hyperbolic system. In particular, for , employing the DL-based ROM technique presented in this work allows us to halve the error indicator associated to the optimal-POD approximation of the FOM solution. Referring to Figure 4, the DL-ROM reconstruction is more accurate than the optimal-POD one, indeed it mostly fits the FOM solution, even in correspondence of its maximum, as shown in the zooms of Figure 4. Moreover, it does not introduce oscillations where a large gradient of the FOM solution is observed, as it happens instead by employing POD.
In Figure 5 we show the same comparison of Figure 4 but this time considering both for POD and DL-ROM a reduced dimension . The difference in terms of accuracy provided by the two approaches is even more striking in this case.
Finally, in Figure 6 we highlight the accuracy properties of both the DL-ROM and POD techniques by displaying the behavior of the error indicator , defined in (23), with respect to the dimension of the corresponding reduced trial manifold. For the DL-ROM approximation is more accurate than the one provided by POD, and only for the two techniques provide almost the same accuracy.
4.2 Test 2: Linear Transport Equation
We consider two tests for this set of parametrized differential models.
First, we consider the parametrized one-dimensional linear transport equation
whose exact solution is . We set and .
The parameter (here ) represents the velocity of the travelling wave and the parameter space is given by . The dataset is built by uniformly sampling the exact solution in the domain , with , and by considering degrees of freedom in the space discretization and time instances in the time one. We consider training-parameter instances uniformly distributed in the parameter space and testing-parameter instances such that , for . This test case, and more in general hyperbolic problems, are examples in which the use of a linear approach to ROM generally yields poor performance in terms of accuracy. Indeed, the dimension of the linear trial manifold must be very large, if compared to the dimension of the solution manifold, in order to capture the variability of the FOM solution over the parameter space . We set in order to assess the performance of DL-ROM in a scenario which is still remarkably challenging for ROM on linear trial manifolds.
Figure 7 shows the exact solution, which here plays the role of the FOM solution, and the DL-ROM one for the testing-parameter instance ; here, we set the dimension of the nonlinear trial manifold to , equal to the dimension of the solution manifold . Moreover, in Figure 7 we highlight the relative error , for , associated to a given (in this case ), defined as
which widens in proximity of the spike of the exact solution.
In Figure 8 we report the exact solution and the DL-ROM one, obtained by setting , for three particular time instances. In order to compare the performance of the proposed nonlinear ROM with a linear approach, we perform the POD on the snapshot matrix and show, for the same testing-parameter instance, the optimal POD-reconstruction, i.e. the projection of the FOM (exact) solution onto the POD basis, in Figure 8. For example, by considering , the error indicator, defined in (23), is . By considering a linear ROM technique instead, even by considering a reduced trial manifold of dimension , built by means of the POD, the reconstructed solution presents spurious oscillations which result in a poor approximation of the FOM solution (see Figure 8). Indeed, in order to achieve the same accuracy obtained through DL-ROM over the testing set one has to select 90 basis functions, i.e. a linear trial manifold of dimension .
Figure 9 shows the behavior of the error indicator (23) with respect to the reduced dimension . By increasing the dimension of the nonlinear trial manifold there is a slight improvement of the performance of the DL-ROM neural network, i.e. the error indicator decreases. This improvement is not particularly relevant because by increasing , the number of parameters of the DL-ROM neural network, i.e. weights and biases, is increased by a limited quantity. In this way the approximation capability of the neural network remains almost the same and so does the error indicator (23).
(Hyperparameters Tuning). The hyperparameters of the DL-ROM neural network are tuned by evaluating the loss function over the validation set and by setting each of them equal to the value minimizing the generalization error on the validation set. In particular, we show the tests performed to choose the size of the (transposed) convolutional kernels in the (decoder) encoder function, the number of hidden layers in the feedforward neural network and the number of neurons for each hidden layer. The hyperparameters evaluation starts from the default configuration in Table
(Hyperparameters Tuning). The hyperparameters of the DL-ROM neural network are tuned by evaluating the loss function over the validation set and by setting each of them equal to the value minimizing the generalization error on the validation set. In particular, we show the tests performed to choose the size of the (transposed) convolutional kernels in the (decoder) encoder function, the number of hidden layers in the feedforward neural network and the number of neurons for each hidden layer. The hyperparameters evaluation starts from the default configuration in Table3.
|Kernel Size||Hidden Layers||Neurons|
Then, the best values are found iteratively by studying the impact of the variation of a single hyperparameter at a time on the validation loss. Once the best value of a hyperparameter is found, this value replaces the default value from that point on. For each hyperparameter the tuning is performed in a range of values for which the training of the network is affordable regarding computational costs.
In Figure 10, we show the impact of the size of the convolutional kernels on the loss over the validation and testing sets, the number of hidden layers in the feedforward forward neural network and the number of neurons in each hidden layer by varying the reduced dimension in order to find the best value of such hyperparameter over . The final configuration of the DL-ROM neural network is the one provided in Table 4.
|Kernel Size||Hidden Layers||Neurons|
Here we consider again the parametrized one-dimensional transport equation
The exact solution of (27) is but this time we set the initial datum equal to
where . The parameters belong to the parameter space . We build the dataset by uniformly sampling the exact solution in the domain , with and , and by considering grid points for the space discretization and time instances for the time one. We collect, both for and , training-parameter instances uniformly distributed in the parameter space and testing-parameter instances, selected as in the other test cases. Equation (27), completed with the initial datum (28
), stands as one of the most challenging problems for linear ROM techniques because of the difficulty to accurately reconstruct the jump discontinuity of the exact solution as a linear combination of basis functions computed from the snapshots, for a testing-parameter instance. The architecture of the DL-ROM neural network used here is the one presented in the Test 2.1.
In Figure 11 we show the exact solution, which here again plays the role of the FOM solution, and the DL-ROM one, obtained by setting , equal to the dimension of the solution manifold , for the testing-parameter instance , along with the relative error , defined in (26), which is larger near the jump of the FOM solution.
In Figure 12 we report the DL-ROM and optimal-POD reconstructions, together with the FOM solution, for the time instances and 0.745, and the testing-parameter instance . The dimension of the reduced manifolds are and for the DL-ROM and POD techniques, respectively. By considering a linear ROM technique, even by setting the dimension of the reduced manifold equal to , the reconstructed solution presents spurious oscillations which lead to a poor approximation of the FOM solution. Moreover, the optimal-POD solution is not able to fit the discontinuity of the FOM solution in a sharp way. These oscillations are significantly mitigated by the use of our DL-ROM and the jump discontinuity is accurately fit by the DL-ROM solution, as shown in Figure 12.
Finally, in Figure 13 we highlight the accuracy properties of both the DL-ROM and POD techniques. In particular, the same conclusions observed in Test 2.1, namely those regarding the behaviour of the error indicator (23) with respect to the reduced dimension , still hold. The developed DL-ROM technique allows us to obtain a value for the error indicator equal to with , which instead is achieved by POD only by selecting 165 basis functions, i.e. by building a linear trial manifold of dimension .
4.3 Test 3: Monodomain Equation
We now consider the following one-dimensional coupled PDE-ODE nonlinear system