1 Introduction
Numerical simulation models for the approximation of complex physical systems are usually of high dimension and, thus, require large computational costs. This is infeasible whenever realtime evaluations are needed or a multiquery parametric setting is given in which the highdimensional model needs to be evaluated for various parameters, e.g., for the repeated evaluation in an optimization setting. To accelerate computations, we employ model order reduction (MOR) to derive efficiently evaluable lowdimensional models with rigorous error control w.r.t. the highdimensional model. Wellknown examples for MOR methods are, e.g., the reduced basis method, the proper orthogonal decomposition or balanced truncation, see [benner2017model] for an introduction to different MOR methods.
In this paper, our focus is on reducing dynamical systems of ordinary differential equations, which often arise from spatial discretization of partial differential equations. These dynamical systems frequently possess physical characteristics of the modeled phenomena such as an underlying structure or conservation properties. Possible structures that have been addressed in the context of MOR comprise, e.g., Lagrangian
[lall2003structure, carlberg2015preserving], Hamiltonian or portHamiltonian [polyuga2010structure, chaturantabut2016structure, beattie2019structure] structures. We are in particular interested in preserving Hamiltonian systems arising in, e.g., plasma physics [morrison1980maxwell] or gyroscopic systems [xu2005numerical]. To this end we derive a reduced Hamiltonian system which preserves symplecticity, as otherwise the reduced system could become unphysical in the sense that the energy is not conserved or stability properties are lost. Structurepreserving model reduction for Hamiltonian systems has already been presented for linearsubspace methods in [PM2016, AH2017, BBH2019, BHR2020, UKY2021, sharma2021hamiltonian], for nonlinear Poisson systems in [HP2021] and in [HPR2020, Pagliantini2021] utilizing a dynamicalintime approach motivated by lowrank approximations. We extend this literature by developing reduced models for Hamiltonian systems on nonlinear symplectic trial manifolds, which can drastically lower the dimension of the reduced model needed to build an accurate reduced model.One motivation for using nonlinear manifolds as lowdimensional approximations instead of linear subspaces arises from model reduction of transportdominated problems. It is known that such problems can exhibit a slow decay of the Kolmogorovwidths of, e.g., [ohlberger2016reduced, greif2019decay], which describes the bestpossible error for a linearsubspace ROM of size . Recent efforts to break the Kolmogorovwidths for transport problems include transforming and shifting the basis functions as well as the physical domain of the individual snapshots [iollo2014advection, welper2017interpolation, reiss2018shifted, cagniart2019model, black2020projection], the method of freezing [ohlberger2013nonlinear] and splitting of one global domain into multiple local linear subspaces [ohlberger2015error]. Model reduction for general nonlinear manifolds has already been performed in [gu2011model], which is limited to piecewiselinear manifolds and in [cruz2020]
, where the manifolds is constructed via moment matching and the equations of the nonlinear dynamical systems have to be known. In
[Rim2019ManifoldAV], the authors introduced manifold approximations via transported subspaces, which restricts to 1D problems for now. Recent approaches using autoencoders have been introduced in [LC2020A] as MOR on manifolds and extended in [LC2020B] to satisfy physical conservation laws. In [KCWZ2020B], a shallow masked autoencoder combined with a hyperreduction approach has been employed to construct a reduced manifold. In [FDM2021], the authors not only learn a reduced manifold via an autoencoder, but also utilize a feedforward network to approximate the reduced model.To the best of our knowledge the aforementioned approaches either consider symplectic model reduction or MOR on manifolds, but not their combination. Our goal is to bridge the two approaches by providing a method for symplectic model reduction on manifolds. In particular, this article provides the following new contributions:

Symplectic manifold Galerkin (SMG) projection techniques, which project Hamiltonian systems on nonlinear symplectic trial manifolds (Subsection 2.3). Using this technique, the resulting SMGROM is again a Hamiltonian system.

Analysis, which includes:

Preservation of energy and stability: we detail sufficient conditions for the reduced model to preserving energy and/or stability in the sense of Lyapunov (Subsections 2.5 and 2.4).

Error bound: we develop a rigorous aposteriori error bound based on a timediscrete formulation (Subsection 2.6.2).


Weakly symplectic deep convolutional autoencoder (weakly symplectic DCA): we present a new weakly symplectic DCA (Section 3) to approximate a nonlinear symplectic trial manifold.^{1}^{1}1Note, that we are not restricted to autoencoders here, but could use any technique which provides a symplectic map from the lowdimensional to the highdimensional space, see creftype 2.

Numerical experiments: We numerically investigate the linear wave equation for the challenging case of a thin moving pulse and demonstrate the ability of the SMGROM to outperform (a) (non)structurepreserving linearsubspace ROMs and (b) nonstructurepreserving MOR on manifolds (Subsection 4.1).
This paper is structured as follows: in Section 2, we define the highdimensional Hamiltonian system and subsequently construct the reduced Hamiltonian system on a nonlinear symplectic trial manifold. Additionally, we show that the reduced model can inherit properties like energy and stabilitypreservation and we develop a rigorous aposteriori error bound. In Section 3, we detail the weakly symplectic DCA. Numerical experiments on a Hamiltonian system with known slowly decaying Kolmogorovwidths are performed in Section 4. Finally, we conclude in Section 5.
2 Symplectic Model Reduction on Manifolds
In this section, we introduce the model reduction of Hamiltonian systems on nonlinear symplectic trial manifolds. To this end, we start with a review of the necessary details on symplectic geometry and subsequently state our fullorder model, a parametric Hamiltonian system. Then, we detail our reduced model on a nonlinear symplectic trial manifold and show that stability and energy can be preserved with our symplectic manifold Galerkin (SMG) projection technique. We conclude this section by developing a rigorous aposteriori error bound.
2.1 Symplectic Geometry
Let be a smooth manifold and let be a symplectic form on , i.e., a closed, nondegenerate differential 2form.^{2}^{2}2See, e.g., [marsden1995introduction, lee2013] for an introduction of smooth manifolds. It follows immediately, that needs to be of even dimension [lee2013, Proposition 22.7]. The combination of a smooth manifold and a symplectic form is denoted a symplectic manifold. For the case of
being a vector space over
of even dimension , , we introduce the corresponding definition of a symplectic form.Definition 1 (Symplectic Form over , Symplectic Vector Space).
Let be an vector space of dimension , . Let
be a skewsymmetric and nondegenerate bilinear form, i.e., for all
, it holdsThe bilinear form is called symplectic form on and the pair is called symplectic vector space.
For a symplectic form on , dim, it is known that there exists a basis
such that the symplectic form can be written in canonical coordinates
(1) 
where denote the respective coordinate vector of , see, e.g., [da2008lectures]. The matrix
is called the canonical Poisson matrix, where
is the identity matrix and
is the matrix of all zeros. One can easily see, that and . With this notation at hand, we define a symplectic map as follows.Definition 2 (Symplectic Map, Symplectic Matrix [Hlw2006, Section VI.2]).
Let be an open set. A differentiable map is called symplectic if the Jacobian matrix^{3}^{3}3Note that the subscript denotes the derivative w.r.t. the variable and the function argument indicates the position at which the Jacobian is evaluated. is a symplectic matrix for every , i.e.,
(2) 
Note that due to the above definition, if the Jacobian is a symplectic matrix, it implies that has full column rank, i.e., rank for , since and are nonsingular. With the above definition, we now can introduce the notion of the symplectic inverse.
Definition 3 (Symplectic Inverse).
If is a symplectic matrix, we denote by
the symplectic inverse^{4}^{4}4The symplectic inverse should not be confused with the Moore–Penrose pseudo inverse which is often referred with a similar notation.
It follows that
(3) 
as well as
(4) 
2.2 FullOrder Model
We introduce our highdimensional fullorder model (FOM), which is a parametric autonomous Hamiltonian system. Let be a smooth function called Hamiltonian function dependent on the parameter . Further, let be a time interval with final time . The FOM in canonical coordinates reads: for a given , , find such that
(5) 
where is the socalled Hamiltonian vector field and denotes continuously differentiable functions in time with values in . The flow of is a mapping evolving the initial state to the corresponding solution of Eq. 5, i.e.,
where represents the solution with initial data . An important property of the flow is that it preserves the Hamiltonian
where the last equality follows from the skewsymmetry of .
In order that Eq. 5 is wellposed, we assume that for any the operator is Lipschitz continuous w.r.t. .
2.3 Symplectic Manifold Galerkin ROM
We now detail how to derive a symplectic manifold Galerkin ROM (SMGROM) on a nonlinear symplectic trial manifold and at the same time ensure that this reduced system preserves the symplectic structure of the Hamiltonian system. This nonlinear symplectic trial manifold aims at approximating the detailed solution manifold
In particular, we are looking for approximate solutions of the form
(6) 
with the (possibly parametric) reference state , the reduced coordinates (also known as latent variables) with , the reconstructed solution and the reconstruction function (or decoder) . The reconstruction function is assumed to be a (mostly nonlinear) symplectic map Eq. 2. With this choice, the reconstructed solution evolves on the nonlinear trial manifold
(7) 
The evolution of the reconstructed solution in time is
(8) 
due to the chain rule. To construct the ROM, we define the timecontinuous residual of the FOM
Eq. 5 w.r.t. the approximation byFor the ROM, we require that the symplectic projection of the residual
vanishes. Note that exists due to the symplecticity of . Reformulating this equation yields
which follows due to the properties of the Poisson matrix in Eq. 4, Eq. 3 and the chain rule. We denote by , , the reduced Hamiltonian. Thus, the SMGROM on the nonlinear symplectic trial manifold reads: for a given , , find such that
(9) 
By , we denote the reduced Hamiltonian vector field. Moreover, the choice of to be a symplectic map guarantees that the tuple with the nonlinear trial manifold from Eq. 7 and the canonical symplectic form from Eq. 1 is a symplectic manifold, see [HLW2006, Section VII.2.3]. This is why we refer to as a nonlinear symplectic trial manifold.
In order to solve Eq. 9 numerically, a symplectic numerical integration scheme is applied and the resulting system of nonlinear equations is solved for each discrete time step with a quasiNewton scheme which omits secondorder derivatives of in analogy to [LC2020A, Section 3.4].
In the following we show that energy and stability can be preserved during the time evolution, since the above constructed reduced system is again a Hamiltonian system.
2.4 Energy Preservation
Proposition 1.
Proof.
The proof is based on the property that the Hamiltonian is preserved over time for both, FOM and ROM. Omitting dependency for readability, this reads
and 
∎
Remark 1.
The reference state is set to for an arbitrary initial value of the reduced system. As observed in [LC2020A, Remark 3.1], this leads to an exact reproduction of the initial value . With the SMG projection, we can further conclude with creftype 1 that this implies an exact reproduction of the Hamiltonian .
2.5 Stability Preservation
In this section we examine the question of stability of the fullorder and reduced model in the sense of Lyapunov. For the ease of notation, we omit the time and dependency for the remainder of this section whenever possible.
Consider the differential equation
(11) 
where is a smooth function from an open set into . Let have an equilibrium point , such that and let be the flow of Eq. 11. Then, Lyapunov stability of the point is defined as:
Definition 4 (Lyapunov Stability [Mo2017, Section 12]).
An equilibrium point of Eq. 11 is Lyapunov stable if for every , there exists such that for all whenever .
In order to state sufficient conditions for an equilibrium point to be Lyapunov stable, we recall Lyapunov’s stability theorem.
Theorem 1 (Lyapunov’s Stability Theorem [Mo2017, Theorem 12.1.1]).
Let be an equilibrium point of Eq. 11. If there exists a function with , such that , positive definite (where denotes the Hessian of ) and
(12) 
then is called a Lyapunov stable point.
The function in the aforementioned theorem is also called the Lyapunov function. For Hamiltonian systems, a suitable candidate for is the Hamiltonian itself. In the case of autonomous Hamiltonians considered in this paper, a point is an equilibrium point of Eq. 5 if and only if is a critical point of , i.e., a point s.t. [AM1987, Proposition 3.4.16]. Following the wellknown Dirichlet’s stability theorem for Hamiltonian systems [MO2017, Corollary 12.1.1], it suffices that the equilibrium point is a strict local maximum or minimum in order to be a Lyapunov stable point. Equation 12 is automatically fulfilled for autonomous Hamiltonian systems which can be easily seen by
due to the skewsymmetry of . Similar to [AH2017, Theorem 18] in the special case of linear, we are now able to show, that if (or ) is a Lyapunov function in an environment of the equilibrium point, then these equilibrium points for the fullorder and reduced model are Lyapunov stable. In the following we extend this theorem to hold for more general nonlinear .
Theorem 2 (Lyapunov Stability).
Assume that is an equilibrium point of Eq. 5 and there exists such that . If (or ) is a Lyapunov function as defined in creftype 1, then and are Lyapunov stable equilibrium points for the fullorder model Eq. 5 and the reduced model Eq. 9, respectively.
Proof.
As is an equilibrium point and is a Lyapunov function, it immediately follows from creftype 1 that is Lyapunov stable. By using the chain rule, we arrive at . Evaluating the latter at yields
where the last equivalence follows as is a critical point for , i.e., . Thus is an equilibrium point for Eq. 9. In order to show that is a strict local minimum, we consider the Hessian matrix. Again, considering the chain rule, we get
Therefore, for any , we obtain that is positive definite
due to the positive definiteness of . With Dirichlet’s stability theorem, we conclude that is a Lyapunov stable point for Eq. 9. ∎
2.6 Error Estimation
In this section, we deduce a rigorous aposteriori error bound. This error bound is coupled to the chosen timediscretization scheme, which needs to be a socalled symplectic integrator [HLW2006, BM2017] in order to preserve the underlying symplectic structure of the Hamiltonian system.
2.6.1 Time Discretization
We use a RungeKutta (RK) scheme for the time discretization of our Hamiltonian system. Since not all RK methods meet the requirements of being a symplectic integrator, we detail the necessary conditions after the definition.
We start by applying an equidistant time discretization to the interval for . Therefore, we set , and seek an approximation . Note that the superscript, against the standard notation, denotes the enumeration of timedependent variables instead of the power of those quantities.
Definition 5 (Runge Kutta method).
An sstage RK method (formulated for the autonomous Hamiltonian system Eq. 5) with , real numbers , is for given by
RungeKutta methods are symplectic if the coefficients satisfy
see [HLW2006, Theorem VI.4.3, p.178], which, e.g., holds for the implicit midpoint rule, equationparentequation
(13a)  
(13b) 
which therefore is a symplectic integrator. To compute a solution for the timediscrete ODE, we need to solve the following algebraic equations at every discrete timestep
(14) 
where the th RK residual is defined by
with the updated state
(15) 
In order to compute a timediscrete approximation of the reduced problem Eq. 9, we solve the algebraic equations for
(16) 
where the th reduced RK residual and updated reduced state are given by equationparentequation
(17a)  
(17b) 
2.6.2 Error Bound
In this section we provide a rigorous aposteriori error bound. We assume that a RK method, defined in the section above, has been used to construct the timediscrete scheme. For linear , bounds have been derived in [CBA2017, Theorem 6.19], which we extend in the following to error bounds for more general nonlinear .
Theorem 3 (Error Bound).
If is Lipschitz continuous, i.e., there exists a constant such that
and is chosen sufficiently small such that

the matrix with entries is invertible and

for every , if , then ,^{5}^{5}5For the implicit midpoint rule , this condition holds if , which can be ensured by choosing .
then the following error bound holds for
with
Here, respective is a solution of Eq. 14, Eq. 16 in time instance with update in Eq. 15, Eq. 17b. By we denote the reconstructed state and by the reconstructed velocities. The residuals w.r.t. the reconstructed state and velocities are defined by equationparentequation
(18a)  
(18b) 
Note that if the mapping is linear, we have the standard RungeKutta update step Eq. 15 for the reconstructed solution.
Proof.
Subtracting from yields
Noting that , we arrive at
Taking the norm on both sides and using the Lipschitz continuity yields
Rearranging both sides leads to
As is small enough s.t. assumptions (a1), (a2) hold, for , we arrive at
(19) 
Evaluating the residual Eq. 18b at and adding a zero by reformulating Eq. 15 yields
By inserting Eq. 19 it follows
Finally, an induction argument concludes the proof. ∎
Note, that the Lipschitz condition on does not pose a severely limiting additional requirement, as it is a typical condition for the wellposedness of Eq. 5.
3 Nonlinear Symplectic Trial Manifold based on Deep Convolutional Autoencoders
It is yet open how to construct a symplectic reconstruction function and how to choose the reduced initial value in Eq. 9. In the scope of this work, we investigate autoencoders for this purpose. In this section, we first introduce existing autoencoders and deep convolutional autoencoders (DCA). Furthermore, we introduce our novel weakly symplectic DCA. Before we start describing the general idea of autoencoders, we briefly highlight that autoencoders are not the sole choice to define a reconstruction function.
Remark 2 (Alternatives to Autoencoders).
Instead of relying on an autoencoder to provide the reconstruction function
, one could consider using other approaches. One possibility would be the kernel principal component analysis (kPCA)
[Schoelkopf98], which has the basic idea of embedding a nonlinear manifold in into with . The corresponding mapping and dimension should be chosen such that the nonlinear manifold in can be represented as a linear manifold in . Subsequently, a classical PCA is applied to identify the lowdimensional space. Additional methods to construct such a mapping are, e.g., the Gaussian process latent variable model (GPLVM) [lawrence2003gaussian][kohonen1982self], diffeomorphic dimensionality reduction [NIPS2008_647bba34] and deep kernel networks [wenzel_sdkn_turbulence].3.1 Autoencoder
Autoencoders Hinton96[Chapter 14]Goodfellowetal2016 are one approach in the class of (nonlinear) dimensionality reduction methods that learn a lowdimensional representation of a given, finite dataset , . An autoencoder is characterized by a pair of parametric mappings which are called the encoder and the decoder
. Typically both maps are artificial neural networks (ANNs) where
describes the network parameters such as the weights and biases. The encoder–decoder pair is optimized to minimize the loss(20) 
i.e., the data
should be invariant under compression with the encoder and subsequent decompression with the decoder. With this construction, autoencoders do not require labeled data which is known as unsupervised learning. The constant
normalizes the squared loss which is known as the mean squared loss (MSE). The training process identifies a (most often locally) optimal parameter .3.2 Deep Convolutional Autoencoder
DCAs [LC2020A] choose the training data as with
(21) 
where is a numerical approximation for the th time step , and , , is a finite training parameter set. Note that all initial values in Eq. 21 are shifted to by construction. Thus, the training of the DCA includes the condition . Based on the optimized network parameters , the reconstruction function is then defined by and the reduced initial value is set to , cf. [LC2020A]. The specific choice of the reduced initial value together with leads to a reference state . Thus, the reconstruction function has to describe deviations from the initial value in Eq. 6 which is consistent with the choice of training data in Eq. 21. Note that the training of an autoencoder for model reduction purposes uses the same data as the snapshotbased basis generation techniques in classical model reduction ( linear), see [LC2020A].
3.3 Weakly Symplectic Deep Convolutional Autoencoder
In general the decoder of the autoencoder learned with the DCA method will not be symplectic as required in Eq. 2
. Thus, we extent the DCAs to weakly symplectic DCAs by adding this constraint additionally to our loss function equationparentequation