DeepAI

# Symplectic Model Reduction of Hamiltonian Systems on Nonlinear Manifolds

Classical model reduction techniques project the governing equations onto linear subspaces of the high-dimensional state-space. For problems with slowly decaying Kolmogorov-n-widths such as certain transport-dominated problems, however, classical linear-subspace reduced-order models (ROMs) of low dimension might yield inaccurate results. Thus, the concept of classical linear-subspace ROMs has to be extended to more general concepts, like Model Order Reduction (MOR) on manifolds. Moreover, as we are dealing with Hamiltonian systems, it is crucial that the underlying symplectic structure is preserved in the reduced model, as otherwise it could become unphysical in the sense that the energy is not conserved or stability properties are lost. To the best of our knowledge, existing literature addresses either MOR on manifolds or symplectic model reduction for Hamiltonian systems, but not their combination. In this work, we bridge the two aforementioned approaches by providing a novel projection technique called symplectic manifold Galerkin (SMG), which projects the Hamiltonian system onto a nonlinear symplectic trial manifold such that the reduced model is again a Hamiltonian system. We derive analytical results such as stability, energy-preservation and a rigorous a-posteriori error bound. Moreover, we construct a weakly symplectic deep convolutional autoencoder as a computationally practical approach to approximate a nonlinear symplectic trial manifold. Finally, we numerically demonstrate the ability of the method to outperform (non-)structure-preserving linear-subspace ROMs and non-structure-preserving MOR on manifold techniques.

• 4 publications
• 2 publications
• 21 publications
12/20/2018

### Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders

Nearly all model-reduction techniques project the governing equations on...
10/14/2017

### Hyper-reduction over nonlinear manifolds for large nonlinear mechanical systems

Common trends in model order reduction of large nonlinear finite-element...
02/11/2022

### Reduced order modeling for flow and transport problems with Barlow Twins self-supervised learning

We propose a unified data-driven reduced order model (ROM) that bridges ...
08/17/2020

### Dynamical reduced basis methods for Hamiltonian systems

We consider model order reduction of parameterized Hamiltonian systems d...
09/25/2021

### Structure-preserving model order reduction of Hamiltonian systems

We discuss the recent developments of projection-based model order reduc...
09/23/2020

### Port-Hamiltonian approximation of a nonlinear flow problem. Part I: Space approximation ansatz

This paper is on the systematic development of robust and online-efficie...
10/10/2021

### On snapshot-based model reduction under compatibility conditions for a nonlinear flow problem on networks

This paper is on the construction of structure-preserving, online-effici...

## 1 Introduction

Numerical simulation models for the approximation of complex physical systems are usually of high dimension and, thus, require large computational costs. This is infeasible whenever real-time evaluations are needed or a multi-query parametric setting is given in which the high-dimensional model needs to be evaluated for various parameters, e.g., for the repeated evaluation in an optimization setting. To accelerate computations, we employ model order reduction (MOR) to derive efficiently evaluable low-dimensional models with rigorous error control w.r.t. the high-dimensional model. Well-known examples for MOR methods are, e.g., the reduced basis method, the proper orthogonal decomposition or balanced truncation, see [benner2017model] for an introduction to different MOR methods.

In this paper, our focus is on reducing dynamical systems of ordinary differential equations, which often arise from spatial discretization of partial differential equations. These dynamical systems frequently possess physical characteristics of the modeled phenomena such as an underlying structure or conservation properties. Possible structures that have been addressed in the context of MOR comprise, e.g., Lagrangian

[lall2003structure, carlberg2015preserving], Hamiltonian or port-Hamiltonian [polyuga2010structure, chaturantabut2016structure, beattie2019structure] structures. We are in particular interested in preserving Hamiltonian systems arising in, e.g., plasma physics [morrison1980maxwell] or gyroscopic systems [xu2005numerical]. To this end we derive a reduced Hamiltonian system which preserves symplecticity, as otherwise the reduced system could become unphysical in the sense that the energy is not conserved or stability properties are lost. Structure-preserving model reduction for Hamiltonian systems has already been presented for linear-subspace methods in [PM2016, AH2017, BBH2019, BHR2020, UKY2021, sharma2021hamiltonian], for nonlinear Poisson systems in [HP2021] and in [HPR2020, Pagliantini2021] utilizing a dynamical-in-time approach motivated by low-rank approximations. We extend this literature by developing reduced models for Hamiltonian systems on nonlinear symplectic trial manifolds, which can drastically lower the dimension of the reduced model needed to build an accurate reduced model.

One motivation for using nonlinear manifolds as low-dimensional approximations instead of linear subspaces arises from model reduction of transport-dominated problems. It is known that such problems can exhibit a slow decay of the Kolmogorov--widths of, e.g., [ohlberger2016reduced, greif2019decay], which describes the best-possible error for a linear-subspace ROM of size . Recent efforts to break the Kolmogorov--widths for transport problems include transforming and shifting the basis functions as well as the physical domain of the individual snapshots [iollo2014advection, welper2017interpolation, reiss2018shifted, cagniart2019model, black2020projection], the method of freezing [ohlberger2013nonlinear] and splitting of one global domain into multiple local linear subspaces [ohlberger2015error]. Model reduction for general nonlinear manifolds has already been performed in [gu2011model], which is limited to piecewise-linear manifolds and in [cruz2020]

, where the manifolds is constructed via moment matching and the equations of the nonlinear dynamical systems have to be known. In

[Rim2019ManifoldAV], the authors introduced manifold approximations via transported subspaces, which restricts to 1D problems for now. Recent approaches using autoencoders have been introduced in [LC2020A] as MOR on manifolds and extended in [LC2020B] to satisfy physical conservation laws. In [KCWZ2020B], a shallow masked autoencoder combined with a hyper-reduction approach has been employed to construct a reduced manifold. In [FDM2021], the authors not only learn a reduced manifold via an autoencoder, but also utilize a feed-forward network to approximate the reduced model.

To the best of our knowledge the aforementioned approaches either consider symplectic model reduction or MOR on manifolds, but not their combination. Our goal is to bridge the two approaches by providing a method for symplectic model reduction on manifolds. In particular, this article provides the following new contributions:

1. Symplectic manifold Galerkin (SMG) projection techniques, which project Hamiltonian systems on nonlinear symplectic trial manifolds (Subsection 2.3). Using this technique, the resulting SMG-ROM is again a Hamiltonian system.

2. Analysis, which includes:

• Preservation of energy and stability: we detail sufficient conditions for the reduced model to preserving energy and/or stability in the sense of Lyapunov (Subsections 2.5 and 2.4).

• Error bound: we develop a rigorous a-posteriori error bound based on a time-discrete formulation (Subsection 2.6.2).

3. Weakly symplectic deep convolutional autoencoder (weakly symplectic DCA): we present a new weakly symplectic DCA (Section 3) to approximate a nonlinear symplectic trial manifold.111Note, that we are not restricted to autoencoders here, but could use any technique which provides a symplectic map from the low-dimensional to the high-dimensional space, see creftype 2.

4. Numerical experiments: We numerically investigate the linear wave equation for the challenging case of a thin moving pulse and demonstrate the ability of the SMG-ROM to outperform (a) (non-)structure-preserving linear-subspace ROMs and (b) non-structure-preserving MOR on manifolds (Subsection 4.1).

This paper is structured as follows: in Section 2, we define the high-dimensional Hamiltonian system and subsequently construct the reduced Hamiltonian system on a nonlinear symplectic trial manifold. Additionally, we show that the reduced model can inherit properties like energy- and stability-preservation and we develop a rigorous a-posteriori error bound. In Section 3, we detail the weakly symplectic DCA. Numerical experiments on a Hamiltonian system with known slowly decaying Kolmogorov--widths are performed in Section 4. Finally, we conclude in Section 5.

## 2 Symplectic Model Reduction on Manifolds

In this section, we introduce the model reduction of Hamiltonian systems on nonlinear symplectic trial manifolds. To this end, we start with a review of the necessary details on symplectic geometry and subsequently state our full-order model, a parametric Hamiltonian system. Then, we detail our reduced model on a nonlinear symplectic trial manifold and show that stability and energy can be preserved with our symplectic manifold Galerkin (SMG) projection technique. We conclude this section by developing a rigorous a-posteriori error bound.

### 2.1 Symplectic Geometry

Let be a smooth manifold and let be a symplectic form on , i.e., a closed, non-degenerate differential 2-form.222See, e.g., [marsden1995introduction, lee2013] for an introduction of smooth manifolds. It follows immediately, that needs to be of even dimension [lee2013, Proposition 22.7]. The combination of a smooth manifold and a symplectic form is denoted a symplectic manifold. For the case of

being a vector space over

of even dimension , , we introduce the corresponding definition of a symplectic form.

###### Definition 1 (Symplectic Form over R, Symplectic Vector Space).

Let be an -vector space of dimension , . Let

be a skew-symmetric and non-degenerate bilinear form, i.e., for all

, it holds

 ω(v1,v2)=−ω(v2,v1)andω(v1,v3)=0∀v3∈V ⇒ v1=0.

The bilinear form is called symplectic form on and the pair is called symplectic vector space.

For a symplectic form on , dim, it is known that there exists a basis
such that the symplectic form can be written in canonical coordinates

 ω(v1,v2)=ω2N(v1,v2):=v{T}1J2Nv2,∀v1,v2∈V, (1)

where denote the respective coordinate vector of , see, e.g., [da2008lectures]. The matrix

 J2N:=(0NIN−IN0N)∈R2N×2N,

is called the canonical Poisson matrix, where

is the identity matrix and

is the matrix of all zeros. One can easily see, that and . With this notation at hand, we define a symplectic map as follows.

###### Definition 2 (Symplectic Map, Symplectic Matrix [Hlw2006, Section VI.2]).

Let be an open set. A differentiable map is called symplectic if the Jacobian matrix333Note that the subscript denotes the derivative w.r.t. the variable and the function argument indicates the position at which the Jacobian is evaluated. is a symplectic matrix for every , i.e.,

 (Dxd(x)){T}J2NDxd(x)=J2n. (2)

Note that due to the above definition, if the Jacobian is a symplectic matrix, it implies that has full column rank, i.e., rank for , since and are nonsingular. With the above definition, we now can introduce the notion of the symplectic inverse.

###### Definition 3 (Symplectic Inverse).

If is a symplectic matrix, we denote by
the symplectic inverse444The symplectic inverse should not be confused with the Moore–Penrose pseudo inverse which is often referred with a similar notation.

 (Dxd(x))+:=J{T}2n(Dxd(x)){T}J2N.

It follows that

 (Dxd(x))+Dxd(x)=J{T}2n(Dxd(x)){T}J2NDxd(x)=J{T}2nJ2n=I2n, (3)

as well as

 (4)

### 2.2 Full-Order Model

We introduce our high-dimensional full-order model (FOM), which is a parametric autonomous Hamiltonian system. Let be a smooth function called Hamiltonian function dependent on the parameter . Further, let be a time interval with final time . The FOM in canonical coordinates reads: for a given , , find such that

 ddtx(t;μ)=J2N∇xH(x(t;μ);μ)=:XH(x(t;μ);μ),x(0;μ)=x0(μ), (5)

where is the so-called Hamiltonian vector field and denotes continuously differentiable functions in time with values in . The flow of is a mapping evolving the initial state to the corresponding solution of Eq. 5, i.e.,

 ϕt(x0(μ);μ):=x(t;μ;x0(μ)),

where represents the solution with initial data . An important property of the flow is that it preserves the Hamiltonian

 ddtH(ϕt(x;μ);μ) =(∇xH(ϕt(x;μ);μ)){T}ddtϕt(x;μ) =(∇xH(ϕt(x;μ);μ)){T}J2N∇xH(ϕt(x;μ);μ)=0,

where the last equality follows from the skew-symmetry of .

In order that Eq. 5 is well-posed, we assume that for any the operator is Lipschitz continuous w.r.t. .

### 2.3 Symplectic Manifold Galerkin ROM

We now detail how to derive a symplectic manifold Galerkin ROM (SMG-ROM) on a nonlinear symplectic trial manifold and at the same time ensure that this reduced system preserves the symplectic structure of the Hamiltonian system. This nonlinear symplectic trial manifold aims at approximating the detailed solution manifold

 M:={x(t;μ)∈R2N∣∣t∈[0,T],μ∈P}.

In particular, we are looking for approximate solutions of the form

 ˜x(t;μ):=xref(μ)+d(xr(t;μ))≈x(t;μ), (6)

with the (possibly parametric) reference state , the reduced coordinates (also known as latent variables) with , the reconstructed solution and the reconstruction function (or decoder) . The reconstruction function is assumed to be a (mostly nonlinear) symplectic map Eq. 2. With this choice, the reconstructed solution evolves on the nonlinear trial manifold

 ˜M(μ):=xref(μ)+{d(ar)∈R2N∣∣ar∈R2n}. (7)

The evolution of the reconstructed solution in time is

 ddt˜x(t;μ)=(Dxrd(xr(t;μ)))ddtxr(t;μ), (8)

due to the chain rule. To construct the ROM, we define the time-continuous residual of the FOM

Eq. 5 w.r.t. the approximation by

 r(t;μ) :=ddt˜x(t;μ)−J2N∇xH(˜x(t;μ);μ) =(Dxrd(xr(t;μ)))ddtxr(t;μ)−J2N∇xH(xref(μ)+d(xr(t;μ));μ).

For the ROM, we require that the symplectic projection of the residual

 (Dxrd(xr(t;μ)))+r (t;μ)!=02n,

vanishes. Note that exists due to the symplecticity of . Reformulating this equation yields

 ddtxr(t;μ) =(Dxrd(xr(t;μ)))+J2N∇xH(xref(μ)+d(xr(t;μ));μ) =J2n(Dxrd(xr(t;μ))){T}∇xH(xref(μ)+d(xr(t;μ));μ)=∇xrHr(xr(t;μ);μ),

which follows due to the properties of the Poisson matrix in Eq. 4, Eq. 3 and the chain rule. We denote by , , the reduced Hamiltonian. Thus, the SMG-ROM on the nonlinear symplectic trial manifold reads: for a given , , find such that

 ddtxr(t;μ)=J2n∇xrHr(xr(t;μ);μ) xr(0;μ)=xr,0(μ). (9)

By , we denote the reduced Hamiltonian vector field. Moreover, the choice of to be a symplectic map guarantees that the tuple with the nonlinear trial manifold from Eq. 7 and the canonical symplectic form from Eq. 1 is a symplectic manifold, see [HLW2006, Section VII.2.3]. This is why we refer to as a nonlinear symplectic trial manifold.

In order to solve Eq. 9 numerically, a symplectic numerical integration scheme is applied and the resulting system of nonlinear equations is solved for each discrete time step with a quasi-Newton scheme which omits second-order derivatives of in analogy to [LC2020A, Section 3.4].

In the following we show that energy and stability can be preserved during the time evolution, since the above constructed reduced system is again a Hamiltonian system.

### 2.4 Energy Preservation

###### Proposition 1.

Let be the solution of the FOM Eq. 5 at time and let be the reconstructed solution Eq. 6 obtained from solving a SMG-ROM. Then, for any , the error in the Hamiltonian

 ΔH(t;μ):=|H(x(t;μ);μ)−H(˜x(t;μ);μ)|≡|H(x0(μ);μ)−H(˜x(0,μ);μ)| (10)

is constant for all .

###### Proof.

The proof is based on the property that the Hamiltonian is preserved over time for both, FOM and ROM. Omitting -dependency for readability, this reads

 H(x(t))=H(x0) and H(˜x(t))=Hr(xr(t))=Hr(xr(0))=H(˜x(0)).

###### Remark 1.

The reference state is set to for an arbitrary initial value of the reduced system. As observed in [LC2020A, Remark 3.1], this leads to an exact reproduction of the initial value . With the SMG projection, we can further conclude with creftype 1 that this implies an exact reproduction of the Hamiltonian .

### 2.5 Stability Preservation

In this section we examine the question of stability of the full-order and reduced model in the sense of Lyapunov. For the ease of notation, we omit the time and -dependency for the remainder of this section whenever possible.

Consider the differential equation

 ddtx=f(x), (11)

where is a smooth function from an open set into . Let have an equilibrium point , such that and let be the flow of Eq. 11. Then, Lyapunov stability of the point is defined as:

###### Definition 4 (Lyapunov Stability [Mo2017, Section 12]).

An equilibrium point of Eq. 11 is Lyapunov stable if for every , there exists such that for all whenever .

In order to state sufficient conditions for an equilibrium point to be Lyapunov stable, we recall Lyapunov’s stability theorem.

###### Theorem 1 (Lyapunov’s Stability Theorem [Mo2017, Theorem 12.1.1]).

Let be an equilibrium point of Eq. 11. If there exists a function with , such that , positive definite (where denotes the Hessian of ) and

 ∇xV(x){T}f(x)≤0,∀x∈U⊂R2N, (12)

then is called a Lyapunov stable point.

The function in the aforementioned theorem is also called the Lyapunov function. For Hamiltonian systems, a suitable candidate for is the Hamiltonian itself. In the case of autonomous Hamiltonians considered in this paper, a point is an equilibrium point of Eq. 5 if and only if is a critical point of , i.e., a point s.t. [AM1987, Proposition 3.4.16]. Following the well-known Dirichlet’s stability theorem for Hamiltonian systems [MO2017, Corollary 12.1.1], it suffices that the equilibrium point is a strict local maximum or minimum in order to be a Lyapunov stable point. Equation 12 is automatically fulfilled for autonomous Hamiltonian systems which can be easily seen by

 ∇xH(x){T}f(x)=∇xH(x){T}J2N∇xH(x)=0,x∈U,

due to the skew-symmetry of . Similar to [AH2017, Theorem 18] in the special case of linear, we are now able to show, that if (or ) is a Lyapunov function in an environment of the equilibrium point, then these equilibrium points for the full-order and reduced model are Lyapunov stable. In the following we extend this theorem to hold for more general nonlinear .

###### Theorem 2 (Lyapunov Stability).

Assume that is an equilibrium point of Eq. 5 and there exists such that . If (or ) is a Lyapunov function as defined in creftype 1, then and are Lyapunov stable equilibrium points for the full-order model Eq. 5 and the reduced model Eq. 9, respectively.

###### Proof.

As is an equilibrium point and is a Lyapunov function, it immediately follows from creftype 1 that is Lyapunov stable. By using the chain rule, we arrive at . Evaluating the latter at yields

 ∇xrHr(xr,e)=(Dxrd(xr,e)){T}∇xH(xref+d(xr,e))=(Dxrd(xr,e)){T}∇xH(xe)=0,

where the last equivalence follows as is a critical point for , i.e., . Thus is an equilibrium point for Eq. 9. In order to show that is a strict local minimum, we consider the Hessian matrix. Again, considering the chain rule, we get

 ∇2xrHr(xr,e)=(Dxrd(xr,e)){T}∇2xH(xref+d(xr,e))(Dxrd(xr,e)).

Therefore, for any , we obtain that is positive definite

 ξ{T}∇2xrHr(xr,e)ξ=(Dxrd(xr,e)ξ){T}∇2xH(xref+d(xr,e))(Dxrd(xr,e)ξ)>0,

due to the positive definiteness of . With Dirichlet’s stability theorem, we conclude that is a Lyapunov stable point for Eq. 9. ∎

### 2.6 Error Estimation

In this section, we deduce a rigorous a-posteriori error bound. This error bound is coupled to the chosen time-discretization scheme, which needs to be a so-called symplectic integrator [HLW2006, BM2017] in order to preserve the underlying symplectic structure of the Hamiltonian system.

#### 2.6.1 Time Discretization

We use a Runge-Kutta (RK) scheme for the time discretization of our Hamiltonian system. Since not all RK methods meet the requirements of being a symplectic integrator, we detail the necessary conditions after the definition.

We start by applying an equidistant time discretization to the interval for . Therefore, we set , and seek an approximation . Note that the superscript, against the standard notation, denotes the enumeration of time-dependent variables instead of the power of those quantities.

###### Definition 5 (Runge Kutta method).

An s-stage RK method (formulated for the autonomous Hamiltonian system Eq. 5) with , real numbers , is for given by

 wki =XH(xk−1+Δts∑j=1aijwkj),i=1,…,s, xk =xk−1+Δts∑i=1biwki.

Runge-Kutta methods are symplectic if the coefficients satisfy

 biaij+bjaji=bibj,for all i,j=1,…,s,

see [HLW2006, Theorem VI.4.3, p.178], which, e.g., holds for the implicit midpoint rule, equationparentequation

 wk1 =XH(xk−1+Δt2wk1), (13a) xk =xk−1+Δtwk1, (13b)

which therefore is a symplectic integrator. To compute a solution for the time-discrete ODE, we need to solve the following algebraic equations at every discrete time-step

 rki(wk1,…,wks)=0,i=1,…,s, (14)

where the th RK residual is defined by

 rki(w1,…,ws):=wi−XH(xk−1+Δts∑j=1aijwj),i=1,…,s,

with the updated state

 xk=xk−1+Δts∑i=1biwki. (15)

In order to compute a time-discrete approximation of the reduced problem Eq. 9, we solve the algebraic equations for

 rkr,i(wkr,1,…,wkr,s)=0,i=1,…,s, (16)

where the th reduced RK residual and updated reduced state are given by equationparentequation

 rkr,i(wr,1,…,wr,s): =wr,i−XHr(xk−1r+Δts∑j=1aijwr,j), i=1,…,s, (17a) xkr =xk−1r+Δts∑i=1biwkr,i. (17b)

#### 2.6.2 Error Bound

In this section we provide a rigorous a-posteriori error bound. We assume that a RK method, defined in the section above, has been used to construct the time-discrete scheme. For linear , bounds have been derived in [CBA2017, Theorem 6.19], which we extend in the following to error bounds for more general nonlinear .

###### Theorem 3 (Error Bound).

If is Lipschitz continuous, i.e., there exists a constant such that

 ∥XH(x)−XH(y)∥2≤κ∥x−y∥2,∀x,y∈R2N,

and is chosen sufficiently small such that

1. the matrix with entries is invertible and

2. for every , if , then ,555For the implicit midpoint rule , this condition holds if , which can be ensured by choosing .

then the following error bound holds for

 ∥xk −˜xk∥2≤(c1)k∥x0−˜x0∥2+k−1∑ℓ=0(c1)ℓ⋅ (Δts∑m=1|bm|s∑i=1[D−1]mi∥˜rk−ℓi(˜wk−ℓ1,…,˜wk−ℓs)∥2+∥˜rk−ℓx(˜xk−ℓ)∥2),

with

 c1:=(1+κΔts∑m=1|bm|s∑i=1[D−1]mi).

Here, respective is a solution of Eq. 14, Eq. 16 in time instance with update in Eq. 15, Eq. 17b. By we denote the reconstructed state and by the reconstructed velocities. The residuals w.r.t. the reconstructed state and velocities are defined by equationparentequation

 ˜rki(w1,…,ws): =wi−XH(˜xk−1+Δts∑j=1aijwj),i=1,…,s, (18a) ˜rkx(x): =x−˜xk−1−Δts∑i=1bi˜wki. (18b)

Note that if the mapping is linear, we have the standard Runge-Kutta update step Eq. 15 for the reconstructed solution.

###### Proof.

Subtracting from yields

 rki(wk1,…,wks) −˜rki(˜wk1,…,˜wks)=wki−˜wki −(XH(xk−1+Δts∑j=1aijwkj)−XH(˜xk−1+Δts∑j=1aij˜wkj)).

Noting that , we arrive at

 wki−˜wki =−˜rki(˜wk1,…,˜wks) +(XH(xk−1+Δts∑j=1aijwkj)−XH(˜xk−1+Δts∑j=1aij˜wkj)).

Taking the norm on both sides and using the Lipschitz continuity yields

 ∥wki−˜wki∥2≤∥˜rki(˜wk1,…,˜wks)∥2+κ∥xk−1−˜xk−1∥2+κΔts∑j=1|aij|∥wkj−˜wkj∥2.

 ∥wki−˜wki∥2−κΔts∑j=1|aij|∥wkj−˜wkj∥2≤∥˜rki(˜wk1,…,˜wks)∥2+κ∥xk−1−˜xk−1∥2.

As is small enough s.t. assumptions (a1), (a2) hold, for , we arrive at

 ∥wkm−˜wkm∥2≤s∑i=1[D−1]mi(∥˜rki(˜wk1,…,˜wks)∥2+κ∥xk−1−˜xk−1∥2). (19)

Evaluating the residual Eq. 18b at and adding a zero by reformulating Eq. 15 yields

 ∥xk−˜xk∥2≤∥xk−1−˜xk−1∥2+Δts∑m=1|bm|∥wkm−˜wkm∥2+∥˜rkx(˜xk)∥2.

By inserting Eq. 19 it follows

 ∥xk−˜xk∥2 ≤(1+κΔts∑m=1|bm|s∑i=1[D−1]mi)∥xk−1−˜xk−1∥2 +Δts∑m=1|bm|s∑i=1[D−1]mi∥˜rki(˜wk1,…,˜wks)∥2+∥˜rkx(˜xk)∥2.

Finally, an induction argument concludes the proof. ∎

Note, that the Lipschitz condition on does not pose a severely limiting additional requirement, as it is a typical condition for the well-posedness of Eq. 5.

## 3 Nonlinear Symplectic Trial Manifold based on Deep Convolutional Autoencoders

It is yet open how to construct a symplectic reconstruction function and how to choose the reduced initial value in Eq. 9. In the scope of this work, we investigate autoencoders for this purpose. In this section, we first introduce existing autoencoders and deep convolutional autoencoders (DCA). Furthermore, we introduce our novel weakly symplectic DCA. Before we start describing the general idea of autoencoders, we briefly highlight that autoencoders are not the sole choice to define a reconstruction function.

###### Remark 2 (Alternatives to Autoencoders).

Instead of relying on an autoencoder to provide the reconstruction function

, one could consider using other approaches. One possibility would be the kernel principal component analysis (kPCA)

[Schoelkopf98], which has the basic idea of embedding a nonlinear manifold in into with . The corresponding mapping and dimension should be chosen such that the nonlinear manifold in can be represented as a linear manifold in . Subsequently, a classical PCA is applied to identify the low-dimensional space. Additional methods to construct such a mapping are, e.g., the Gaussian process latent variable model (GPLVM) [lawrence2003gaussian][kohonen1982self], diffeomorphic dimensionality reduction [NIPS2008_647bba34] and deep kernel networks [wenzel_sdkn_turbulence].

### 3.1 Autoencoder

Autoencoders Hinton96[Chapter 14]Goodfellow-et-al-2016  are one approach in the class of (nonlinear) dimensionality reduction methods that learn a low-dimensional representation of a given, finite dataset , . An autoencoder is characterized by a pair of parametric mappings which are called the encoder and the decoder

. Typically both maps are artificial neural networks (ANNs) where

describes the network parameters such as the weights and biases. The encoder–decoder pair is optimized to minimize the loss

 (20)

i.e., the data

should be invariant under compression with the encoder and subsequent decompression with the decoder. With this construction, autoencoders do not require labeled data which is known as unsupervised learning. The constant

normalizes the squared loss which is known as the mean squared loss (MSE). The training process identifies a (most often locally) optimal parameter .

### 3.2 Deep Convolutional Autoencoder

DCAs [LC2020A] choose the training data as with

 X(Ptrain):={xk(μ)−x0(μ)∣∣0≤k≤K,μ∈Ptrain} (21)

where is a numerical approximation for the th time step , and , , is a finite training parameter set. Note that all initial values in Eq. 21 are shifted to by construction. Thus, the training of the DCA includes the condition . Based on the optimized network parameters , the reconstruction function is then defined by and the reduced initial value is set to , cf. [LC2020A]. The specific choice of the reduced initial value together with leads to a reference state . Thus, the reconstruction function has to describe deviations from the initial value in Eq. 6 which is consistent with the choice of training data in Eq. 21. Note that the training of an autoencoder for model reduction purposes uses the same data as the snapshot-based basis generation techniques in classical model reduction ( linear), see [LC2020A].

, DCAs purely based on fully-connected ANNs (FNNs) might be too expensive to be trained since or even . Moreover, FNNs do in general not make use of spatial correlation in the data. For such problems, the use of convolutional ANNs (CNNs) lecun1998gradient[Chapter 9]Goodfellow-et-al-2016 is advantageous. These networks are based on the so-called cross-correlation which applies a filter matrix similar to the discrete convolution. The dimension of the filter matrix is independent from the dimension of the input which makes it suitable for high-dimensional inputs. Moreover, this operation is invariant with respect to translations due to its convolutional structure.

### 3.3 Weakly Symplectic Deep Convolutional Autoencoder

In general the decoder of the autoencoder learned with the DCA method will not be symplectic as required in Eq. 2

. Thus, we extent the DCAs to weakly symplectic DCAs by adding this constraint additionally to our loss function equationparentequation