Numerical simulation models for the approximation of complex physical systems are usually of high dimension and, thus, require large computational costs. This is infeasible whenever real-time evaluations are needed or a multi-query parametric setting is given in which the high-dimensional model needs to be evaluated for various parameters, e.g., for the repeated evaluation in an optimization setting. To accelerate computations, we employ model order reduction (MOR) to derive efficiently evaluable low-dimensional models with rigorous error control w.r.t. the high-dimensional model. Well-known examples for MOR methods are, e.g., the reduced basis method, the proper orthogonal decomposition or balanced truncation, see [benner2017model] for an introduction to different MOR methods.
In this paper, our focus is on reducing dynamical systems of ordinary differential equations, which often arise from spatial discretization of partial differential equations. These dynamical systems frequently possess physical characteristics of the modeled phenomena such as an underlying structure or conservation properties. Possible structures that have been addressed in the context of MOR comprise, e.g., Lagrangian[lall2003structure, carlberg2015preserving], Hamiltonian or port-Hamiltonian [polyuga2010structure, chaturantabut2016structure, beattie2019structure] structures. We are in particular interested in preserving Hamiltonian systems arising in, e.g., plasma physics [morrison1980maxwell] or gyroscopic systems [xu2005numerical]. To this end we derive a reduced Hamiltonian system which preserves symplecticity, as otherwise the reduced system could become unphysical in the sense that the energy is not conserved or stability properties are lost. Structure-preserving model reduction for Hamiltonian systems has already been presented for linear-subspace methods in [PM2016, AH2017, BBH2019, BHR2020, UKY2021, sharma2021hamiltonian], for nonlinear Poisson systems in [HP2021] and in [HPR2020, Pagliantini2021] utilizing a dynamical-in-time approach motivated by low-rank approximations. We extend this literature by developing reduced models for Hamiltonian systems on nonlinear symplectic trial manifolds, which can drastically lower the dimension of the reduced model needed to build an accurate reduced model.
One motivation for using nonlinear manifolds as low-dimensional approximations instead of linear subspaces arises from model reduction of transport-dominated problems. It is known that such problems can exhibit a slow decay of the Kolmogorov--widths of, e.g., [ohlberger2016reduced, greif2019decay], which describes the best-possible error for a linear-subspace ROM of size . Recent efforts to break the Kolmogorov--widths for transport problems include transforming and shifting the basis functions as well as the physical domain of the individual snapshots [iollo2014advection, welper2017interpolation, reiss2018shifted, cagniart2019model, black2020projection], the method of freezing [ohlberger2013nonlinear] and splitting of one global domain into multiple local linear subspaces [ohlberger2015error]. Model reduction for general nonlinear manifolds has already been performed in [gu2011model], which is limited to piecewise-linear manifolds and in [cruz2020]
, where the manifolds is constructed via moment matching and the equations of the nonlinear dynamical systems have to be known. In[Rim2019ManifoldAV], the authors introduced manifold approximations via transported subspaces, which restricts to 1D problems for now. Recent approaches using autoencoders have been introduced in [LC2020A] as MOR on manifolds and extended in [LC2020B] to satisfy physical conservation laws. In [KCWZ2020B], a shallow masked autoencoder combined with a hyper-reduction approach has been employed to construct a reduced manifold. In [FDM2021], the authors not only learn a reduced manifold via an autoencoder, but also utilize a feed-forward network to approximate the reduced model.
To the best of our knowledge the aforementioned approaches either consider symplectic model reduction or MOR on manifolds, but not their combination. Our goal is to bridge the two approaches by providing a method for symplectic model reduction on manifolds. In particular, this article provides the following new contributions:
Symplectic manifold Galerkin (SMG) projection techniques, which project Hamiltonian systems on nonlinear symplectic trial manifolds (Subsection 2.3). Using this technique, the resulting SMG-ROM is again a Hamiltonian system.
Analysis, which includes:
Error bound: we develop a rigorous a-posteriori error bound based on a time-discrete formulation (Subsection 2.6.2).
Weakly symplectic deep convolutional autoencoder (weakly symplectic DCA): we present a new weakly symplectic DCA (Section 3) to approximate a nonlinear symplectic trial manifold.111Note, that we are not restricted to autoencoders here, but could use any technique which provides a symplectic map from the low-dimensional to the high-dimensional space, see creftype 2.
Numerical experiments: We numerically investigate the linear wave equation for the challenging case of a thin moving pulse and demonstrate the ability of the SMG-ROM to outperform (a) (non-)structure-preserving linear-subspace ROMs and (b) non-structure-preserving MOR on manifolds (Subsection 4.1).
This paper is structured as follows: in Section 2, we define the high-dimensional Hamiltonian system and subsequently construct the reduced Hamiltonian system on a nonlinear symplectic trial manifold. Additionally, we show that the reduced model can inherit properties like energy- and stability-preservation and we develop a rigorous a-posteriori error bound. In Section 3, we detail the weakly symplectic DCA. Numerical experiments on a Hamiltonian system with known slowly decaying Kolmogorov--widths are performed in Section 4. Finally, we conclude in Section 5.
2 Symplectic Model Reduction on Manifolds
In this section, we introduce the model reduction of Hamiltonian systems on nonlinear symplectic trial manifolds. To this end, we start with a review of the necessary details on symplectic geometry and subsequently state our full-order model, a parametric Hamiltonian system. Then, we detail our reduced model on a nonlinear symplectic trial manifold and show that stability and energy can be preserved with our symplectic manifold Galerkin (SMG) projection technique. We conclude this section by developing a rigorous a-posteriori error bound.
2.1 Symplectic Geometry
Let be a smooth manifold and let be a symplectic form on , i.e., a closed, non-degenerate differential 2-form.222See, e.g., [marsden1995introduction, lee2013] for an introduction of smooth manifolds. It follows immediately, that needs to be of even dimension [lee2013, Proposition 22.7]. The combination of a smooth manifold and a symplectic form is denoted a symplectic manifold. For the case of
being a vector space overof even dimension , , we introduce the corresponding definition of a symplectic form.
Definition 1 (Symplectic Form over , Symplectic Vector Space).
Let be an -vector space of dimension , . Let be a
skew-symmetric and non-degenerate bilinear form, i.e., for all
be a skew-symmetric and non-degenerate bilinear form, i.e., for all, it holds
The bilinear form is called symplectic form on and the pair is called symplectic vector space.
For a symplectic form on , dim, it is known that there exists a basis
such that the symplectic form can be written in canonical coordinates
where denote the respective coordinate vector of , see, e.g., [da2008lectures]. The matrix
is called the canonical Poisson matrix, where
is the identity matrix andis the matrix of all zeros. One can easily see, that and . With this notation at hand, we define a symplectic map as follows.
Definition 2 (Symplectic Map, Symplectic Matrix [Hlw2006, Section VI.2]).
Let be an open set. A differentiable map is called symplectic if the Jacobian matrix333Note that the subscript denotes the derivative w.r.t. the variable and the function argument indicates the position at which the Jacobian is evaluated. is a symplectic matrix for every , i.e.,
Note that due to the above definition, if the Jacobian is a symplectic matrix, it implies that has full column rank, i.e., rank for , since and are nonsingular. With the above definition, we now can introduce the notion of the symplectic inverse.
Definition 3 (Symplectic Inverse).
If is a symplectic matrix, we denote by
the symplectic inverse444The symplectic inverse should not be confused with the Moore–Penrose pseudo inverse which is often referred with a similar notation.
It follows that
as well as
2.2 Full-Order Model
We introduce our high-dimensional full-order model (FOM), which is a parametric autonomous Hamiltonian system. Let be a smooth function called Hamiltonian function dependent on the parameter . Further, let be a time interval with final time . The FOM in canonical coordinates reads: for a given , , find such that
where is the so-called Hamiltonian vector field and denotes continuously differentiable functions in time with values in . The flow of is a mapping evolving the initial state to the corresponding solution of Eq. 5, i.e.,
where represents the solution with initial data . An important property of the flow is that it preserves the Hamiltonian
where the last equality follows from the skew-symmetry of .
In order that Eq. 5 is well-posed, we assume that for any the operator is Lipschitz continuous w.r.t. .
2.3 Symplectic Manifold Galerkin ROM
We now detail how to derive a symplectic manifold Galerkin ROM (SMG-ROM) on a nonlinear symplectic trial manifold and at the same time ensure that this reduced system preserves the symplectic structure of the Hamiltonian system. This nonlinear symplectic trial manifold aims at approximating the detailed solution manifold
In particular, we are looking for approximate solutions of the form
with the (possibly parametric) reference state , the reduced coordinates (also known as latent variables) with , the reconstructed solution and the reconstruction function (or decoder) . The reconstruction function is assumed to be a (mostly nonlinear) symplectic map Eq. 2. With this choice, the reconstructed solution evolves on the nonlinear trial manifold
The evolution of the reconstructed solution in time is
due to the chain rule. To construct the ROM, we define the time-continuous residual of the FOMEq. 5 w.r.t. the approximation by
For the ROM, we require that the symplectic projection of the residual
vanishes. Note that exists due to the symplecticity of . Reformulating this equation yields
which follows due to the properties of the Poisson matrix in Eq. 4, Eq. 3 and the chain rule. We denote by , , the reduced Hamiltonian. Thus, the SMG-ROM on the nonlinear symplectic trial manifold reads: for a given , , find such that
By , we denote the reduced Hamiltonian vector field. Moreover, the choice of to be a symplectic map guarantees that the tuple with the nonlinear trial manifold from Eq. 7 and the canonical symplectic form from Eq. 1 is a symplectic manifold, see [HLW2006, Section VII.2.3]. This is why we refer to as a nonlinear symplectic trial manifold.
In order to solve Eq. 9 numerically, a symplectic numerical integration scheme is applied and the resulting system of nonlinear equations is solved for each discrete time step with a quasi-Newton scheme which omits second-order derivatives of in analogy to [LC2020A, Section 3.4].
In the following we show that energy and stability can be preserved during the time evolution, since the above constructed reduced system is again a Hamiltonian system.
2.4 Energy Preservation
The proof is based on the property that the Hamiltonian is preserved over time for both, FOM and ROM. Omitting -dependency for readability, this reads
The reference state is set to for an arbitrary initial value of the reduced system. As observed in [LC2020A, Remark 3.1], this leads to an exact reproduction of the initial value . With the SMG projection, we can further conclude with creftype 1 that this implies an exact reproduction of the Hamiltonian .
2.5 Stability Preservation
In this section we examine the question of stability of the full-order and reduced model in the sense of Lyapunov. For the ease of notation, we omit the time and -dependency for the remainder of this section whenever possible.
Consider the differential equation
where is a smooth function from an open set into . Let have an equilibrium point , such that and let be the flow of Eq. 11. Then, Lyapunov stability of the point is defined as:
Definition 4 (Lyapunov Stability [Mo2017, Section 12]).
An equilibrium point of Eq. 11 is Lyapunov stable if for every , there exists such that for all whenever .
In order to state sufficient conditions for an equilibrium point to be Lyapunov stable, we recall Lyapunov’s stability theorem.
Theorem 1 (Lyapunov’s Stability Theorem [Mo2017, Theorem 12.1.1]).
Let be an equilibrium point of Eq. 11. If there exists a function with , such that , positive definite (where denotes the Hessian of ) and
then is called a Lyapunov stable point.
The function in the aforementioned theorem is also called the Lyapunov function. For Hamiltonian systems, a suitable candidate for is the Hamiltonian itself. In the case of autonomous Hamiltonians considered in this paper, a point is an equilibrium point of Eq. 5 if and only if is a critical point of , i.e., a point s.t. [AM1987, Proposition 3.4.16]. Following the well-known Dirichlet’s stability theorem for Hamiltonian systems [MO2017, Corollary 12.1.1], it suffices that the equilibrium point is a strict local maximum or minimum in order to be a Lyapunov stable point. Equation 12 is automatically fulfilled for autonomous Hamiltonian systems which can be easily seen by
due to the skew-symmetry of . Similar to [AH2017, Theorem 18] in the special case of linear, we are now able to show, that if (or ) is a Lyapunov function in an environment of the equilibrium point, then these equilibrium points for the full-order and reduced model are Lyapunov stable. In the following we extend this theorem to hold for more general nonlinear .
Theorem 2 (Lyapunov Stability).
As is an equilibrium point and is a Lyapunov function, it immediately follows from creftype 1 that is Lyapunov stable. By using the chain rule, we arrive at . Evaluating the latter at yields
where the last equivalence follows as is a critical point for , i.e., . Thus is an equilibrium point for Eq. 9. In order to show that is a strict local minimum, we consider the Hessian matrix. Again, considering the chain rule, we get
Therefore, for any , we obtain that is positive definite
due to the positive definiteness of . With Dirichlet’s stability theorem, we conclude that is a Lyapunov stable point for Eq. 9. ∎
2.6 Error Estimation
In this section, we deduce a rigorous a-posteriori error bound. This error bound is coupled to the chosen time-discretization scheme, which needs to be a so-called symplectic integrator [HLW2006, BM2017] in order to preserve the underlying symplectic structure of the Hamiltonian system.
2.6.1 Time Discretization
We use a Runge-Kutta (RK) scheme for the time discretization of our Hamiltonian system. Since not all RK methods meet the requirements of being a symplectic integrator, we detail the necessary conditions after the definition.
We start by applying an equidistant time discretization to the interval for . Therefore, we set , and seek an approximation . Note that the superscript, against the standard notation, denotes the enumeration of time-dependent variables instead of the power of those quantities.
Definition 5 (Runge Kutta method).
An s-stage RK method (formulated for the autonomous Hamiltonian system Eq. 5) with , real numbers , is for given by
Runge-Kutta methods are symplectic if the coefficients satisfy
see [HLW2006, Theorem VI.4.3, p.178], which, e.g., holds for the implicit midpoint rule, equationparentequation
which therefore is a symplectic integrator. To compute a solution for the time-discrete ODE, we need to solve the following algebraic equations at every discrete time-step
where the th RK residual is defined by
with the updated state
In order to compute a time-discrete approximation of the reduced problem Eq. 9, we solve the algebraic equations for
where the th reduced RK residual and updated reduced state are given by equationparentequation
2.6.2 Error Bound
In this section we provide a rigorous a-posteriori error bound. We assume that a RK method, defined in the section above, has been used to construct the time-discrete scheme. For linear , bounds have been derived in [CBA2017, Theorem 6.19], which we extend in the following to error bounds for more general nonlinear .
Theorem 3 (Error Bound).
If is Lipschitz continuous, i.e., there exists a constant such that
and is chosen sufficiently small such that
the matrix with entries is invertible and
for every , if , then ,555For the implicit midpoint rule , this condition holds if , which can be ensured by choosing .
then the following error bound holds for
Here, respective is a solution of Eq. 14, Eq. 16 in time instance with update in Eq. 15, Eq. 17b. By we denote the reconstructed state and by the reconstructed velocities. The residuals w.r.t. the reconstructed state and velocities are defined by equationparentequation
Note that if the mapping is linear, we have the standard Runge-Kutta update step Eq. 15 for the reconstructed solution.
Subtracting from yields
Noting that , we arrive at
Taking the norm on both sides and using the Lipschitz continuity yields
Rearranging both sides leads to
As is small enough s.t. assumptions (a1), (a2) hold, for , we arrive at
By inserting Eq. 19 it follows
Finally, an induction argument concludes the proof. ∎
Note, that the Lipschitz condition on does not pose a severely limiting additional requirement, as it is a typical condition for the well-posedness of Eq. 5.
3 Nonlinear Symplectic Trial Manifold based on Deep Convolutional Autoencoders
It is yet open how to construct a symplectic reconstruction function and how to choose the reduced initial value in Eq. 9. In the scope of this work, we investigate autoencoders for this purpose. In this section, we first introduce existing autoencoders and deep convolutional autoencoders (DCA). Furthermore, we introduce our novel weakly symplectic DCA. Before we start describing the general idea of autoencoders, we briefly highlight that autoencoders are not the sole choice to define a reconstruction function.
Remark 2 (Alternatives to Autoencoders).
Instead of relying on an autoencoder to provide the reconstruction function , one could consider using other approaches. One possibility would be the kernel principal component analysis (kPCA)
, one could consider using other approaches. One possibility would be the kernel principal component analysis (kPCA)[Schoelkopf98], which has the basic idea of embedding a nonlinear manifold in into with . The corresponding mapping and dimension should be chosen such that the nonlinear manifold in can be represented as a linear manifold in . Subsequently, a classical PCA is applied to identify the low-dimensional space. Additional methods to construct such a mapping are, e.g., the Gaussian process latent variable model (GPLVM) [lawrence2003gaussian]kohonen1982self], diffeomorphic dimensionality reduction [NIPS2008_647bba34] and deep kernel networks [wenzel_sdkn_turbulence].
Autoencoders Hinton96[Chapter 14]Goodfellow-et-al-2016 are one approach in the class of (nonlinear) dimensionality reduction methods that learn a low-dimensional representation of a given, finite dataset , . An autoencoder is characterized by a pair of parametric mappings which are called the encoder and the decoder
. Typically both maps are artificial neural networks (ANNs) wheredescribes the network parameters such as the weights and biases. The encoder–decoder pair is optimized to minimize the loss
i.e., the data
should be invariant under compression with the encoder and subsequent decompression with the decoder. With this construction, autoencoders do not require labeled data which is known as unsupervised learning. The constantnormalizes the squared loss which is known as the mean squared loss (MSE). The training process identifies a (most often locally) optimal parameter .
3.2 Deep Convolutional Autoencoder
DCAs [LC2020A] choose the training data as with
where is a numerical approximation for the th time step , and , , is a finite training parameter set. Note that all initial values in Eq. 21 are shifted to by construction. Thus, the training of the DCA includes the condition . Based on the optimized network parameters , the reconstruction function is then defined by and the reduced initial value is set to , cf. [LC2020A]. The specific choice of the reduced initial value together with leads to a reference state . Thus, the reconstruction function has to describe deviations from the initial value in Eq. 6 which is consistent with the choice of training data in Eq. 21. Note that the training of an autoencoder for model reduction purposes uses the same data as the snapshot-based basis generation techniques in classical model reduction ( linear), see [LC2020A].