# Learning Poisson systems and trajectories of autonomous systems via Poisson neural networks

We propose the Poisson neural networks (PNNs) to learn Poisson systems and trajectories of autonomous systems from data. Based on the Darboux-Lie theorem, the phase flow of a Poisson system can be written as the composition of (1) a coordinate transformation, (2) an extended symplectic map and (3) the inverse of the transformation. In this work, we extend this result to the unknotted trajectories of autonomous systems. We employ structured neural networks with physical priors to approximate the three aforementioned maps. We demonstrate through several simulations that PNNs are capable of handling very accurately several challenging tasks, including the motion of a particle in the electromagnetic potential, the nonlinear Schrödinger equation, and pixel observations of the two-body problem.

## Authors

• 7 publications
• 36 publications
• 21 publications
• 51 publications
06/06/2020

### Structure-preserving numerical methods for stochastic Poisson systems

We propose a class of numerical integration methods for stochastic Poiss...
11/25/2021

### Casimir preserving stochastic Lie-Poisson integrators

Casimir preserving integrators for stochastic Lie-Poisson equations with...
11/14/2021

### Splitting integrators for stochastic Lie–Poisson systems

We study stochastic Poisson integrators for a class of stochastic Poisso...
09/25/2019

### OCTNet: Trajectory Generation in New Environments from Past Experiences

Being able to safely operate for extended periods of time in dynamic env...
12/12/2019

### Towards Expressive Priors for Bayesian Neural Networks: Poisson Process Radial Basis Function Networks

While Bayesian neural networks have many appealing characteristics, curr...
11/17/2017

### Solving Poisson's Equation on the Microsoft HoloLens

We present a mixed reality application (HoloFEM) for the Microsoft HoloL...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The connection between dynamical systems and neural network models has been widely studied in the literature, see, for example, [4, 11, 16, 25, 24]

. In general, neural networks can be considered as discrete dynamical systems with the basic dynamics at each step being a linear transformation followed by a component-wise nonlinear (activation) function. In

[5] the neural ODE is introduced as a continuous-depth model instead of specifying a discrete sequence of hidden layers. Even before the introduction of neural ODEs, a series of models with similar architectures had already been proposed to learn the hidden dynamics of a dynamical system in [30, 14, 27]. Theoretical results of the discovery of dynamics are established and enriched in [38], where the inverse modified differential equations are introduced to understand the true dynamical system that is learned when the time derivatives are approximated by numerical schemes. The methods above do not assume the specific form of the equation a priori, however, as the physical systems usually possess intrinsic prior properties, other approaches also take into account the prior information of the systems.

For many physical systems from classical mechanics, the governing equation can be expressed in terms of Hamilton’s equation. We denote the -by-identity matrix by , and let

 J:=(0Id−Id0),

which is an orthogonal, skew-symmetric real matrix, so that

. The canonical Hamiltonian system can be written as

 ˙y=J−1∇H(y), (1)

where , and is the Hamiltonian, typically representing the energy of the system. It is well known that the phase flow of a Hamiltonian system is symplectic. Based on that observation, several numerical schemes which preserve symplecticity have been proposed to solve the forward problem in [12, 17, 26]. In recent works [3, 15, 29, 32, 6, 34], the primary focus has been to solve the inverse problem, i.e., identifying Hamiltonian systems from data, using structured neural networks. For example, HNNs [15] use a neural network to approximate the Hamiltonian in (1), then learn

by reformulating the loss function. Based on HNNs, other models were proposed to tackle problems in generative modeling

[29, 34] and continuous control [37]. Another line of approach is to learn the phase flow of the system directly, while encoding the physical prior as symplecticity in the flow. Recently, we introduced symplectic networks (SympNets) with theoretical guarantees that they can approximate arbitrary symplectic maps [19]. Other networks of this type are presented in [10, 36].

In practice, requiring a dynamical system to be Hamiltonian could be too restrictive, as the system has to be described in canonical coordinates. In [7] Lagrangian Neural Networks (LNNs) were introduced, which allow the system to be expressed in Cartesian coordinates. In [13]

the models of HNNs and LNNs were generalized to Constrained Hamiltonian Neural Networks (CHNNs) and Constrained Lagrangian Neural Networks (CLNNs), enabling them to learn constrained mechanical systems written in Cartesian coordinates. In other developments, autoencoder-based HNNs (AE-HNNs)

[15] and Hamiltonian Generative Networks (HGNs) [34] were proposed to learn and predict the images of mechanical systems, which can be seen as Hamiltonian systems on manifolds embedded in high-dimensional spaces. Theoretically, Hamiltonian systems on manifolds written in noncanonical coordinates are equivalent to an important class of dynamical systems, namely, the Poisson systems. To wit, the Poisson systems take the form of

 ˙y=B(y)∇H(y),

where , is not necessarily an even number; is the Hamiltonian of the Poisson system, and the matrix-valued function plays the role of in (1), which induces a general Poisson bracket as defined in Section II. The Darboux-Lie theorem states that a Poisson system can be turned into a Hamiltonian system by a local coordinate transformation. As a consequence, structure-preserving numerical schemes for Poisson systems are normally developed by finding the coordinate transformation manually, then applying symplectic integrators on the transformed systems, see [33, 17].

Inspired by the Darboux-Lie theorem, we propose a novel neural network architecture, the Poisson neural network (PNN), to learn the phase flow of an arbitrary Poisson system, In other words, PNNs can learn any unknown diffeomorphism of Hamiltonian systems. The coordinate transformation and the phase flow of the transformed Hamiltonian system are parameterized by structured neural networks with physical priors. Specifically, in the general setting, we use invertible neural networks (INNs) [8, 9] to represent the coordinate transformation. If all the data reside on a submanifold with dimension , autoencoders (AEs) can be applied to approximate the coordinate transformation from the coordinates in to its local coordinates as an alternative choice. This strategy is similar to [31], which learns dynamics in the latent space discovered through an autoencoder. Compared to LNNs, PNNs are able to work on a more general coordinate system. Moreover, INN-based PNNs are able to learn multiple trajectories of a Poisson system on the whole data space simultaneously, while AE-HNNs, HGNs, CHNNs and CLNNs are only designed to work on low-dimensional submanifolds of . Further, our work lays a solid theoretical background for all the aforementioned models, suggesting that they are learning a Poisson system explicitly or implicitly.

Another intriguing property of PNNs is that they are not only capable of learning Poisson systems, but are able to approximate an unknotted trajectory of an arbitrary autonomous system. We present related theorems, which indicate the great expressivity of PNNs. We demonstrate through computational experiments that PNNs can be practically useful in terms of learning high-dimensional autonomous systems, long-time prediction, as well as frame interpolation. PNNs also enjoy all the advantages listed in

[19] as they are implemented based on the SympNets in this work. However, PNNs as a high-level architecture can also employ other modern symplectic neural networks.

The rest of the paper is organized as follows. Section II introduces some necessary notation, terminology and fundamental theorems that will be used. The learning theory for PNNs is presented in Section III. Section IV presents the experimental results for several Poisson systems and autonomous systems. A summary is given in the last section. Supporting materials, including the detailed implementation of PNNs, are included in the appendix.

## Ii Preliminaries

The material required for this work is based on the mathematical background of Hamiltonian system and its non-canonical form, i.e., the Poisson system. We refer the readers to [17] for more details.

### Ii-a Hamiltonian and Poisson systems

First, we formally present the definitions of the Hamiltonian system and the Poisson system. We assume that all the functions or maps involved in this paper are as smooth as needed.

###### Definition 1.

The canonical Hamiltonian system takes the form

 ˙y=J−1∇H(y), (2)

where is the Hamiltonian typically representing the energy of the system defined on the open set .

System (2) can also be written in a general form by introducing the Poisson bracket.

###### Definition 2.

Let be an open set. The Poisson bracket is a binary operation satisfying

1. (anticommutativity)
,

2. (bilinearity)
,

3. (Leibniz’s rule)
,

4. (Jacobi identity)
,

for .

Consider the bracket

 {F,G}=d∑i=1(∂F∂qi∂G∂pi−∂F∂pi∂G∂qi)=∇F(y)TJ−1∇G(y), (3)

where . One can check that (3) is indeed a Poisson bracket. Then system (2) can be written as

 ˙yi={yi,H},i=1,⋯,2d,

where in the bracket denotes the map for by a slight abuse of notation. Now we extend the bracket (3) to a general form as

 ({F,G}B)(y)=n∑i,j=1∂F(y)∂yibij(y)∂G(y)∂yj=∇F(y)TB(y)∇G(y), (4)

where is a smooth matrix-valued function. Note that here we do not require to be an even number. As many crucial properties of Hamiltonian systems rely uniquely on the conditions - in Definition 2, we naturally expect the bracket (4) to be a Poisson bracket.

###### Lemma 1.

The bracket defined in (4) is anti-commutative, bilinear and satisfies Leibniz’s rule as well as the Jacobi identity if and only if

 bij(y)=−bji(y) for all i,j

and for all

 n∑l=1(∂bij(y)∂ylblk(y)+∂bjk(y)∂ylbli(y)+∂bki(y)∂ylblj(y))=0.

Lemma 1 provides verifiable equivalence conditions for (4) to become a Poisson bracket. Then, we can give the definition of Poisson system, which is actually the generalized form of the Hamiltonian system.

###### Definition 3.

If a matrix-valued function satisfies Lemma 1, then (4) defines a Poisson bracket, and the corresponding differential system

 ˙y=B(y)∇H(y)

is a Poisson system. Here is still called a Hamiltonian.

Up to now, the Hamiltonian system and the Poisson system have been unified as

 ˙yi={yi,H}B,i=1,⋯,n, (5)

for satisfying Lemma 1, and the system becomes Hamiltonian when .

### Ii-B Symplectic map and Poisson map

The study of the phase flows of the Hamiltonian and Poisson systems focuses on the symplectic map and the Poisson map.

###### Definition 4.

A transformation (where is an open set in ) is called a symplectic map if its Jacobian matrix satisfies

 (∂Φ∂y)TJ(∂Φ∂y)=J.
###### Definition 5.

A transformation (where is an open set in ) is called a Poisson map with respect to the Poisson system (5) if its Jacobian matrix satisfies

 (∂Φ∂y)B(y)(∂Φ∂y)T=B(Φ(y)).

In fact, the phase flow of the Hamiltonian system is a symplectic map, while the phase flow of the Poisson system is a Poisson map, i.e., and satisfy Definition 4 and 5, respectively. Based on these facts, we naturally expect the numerical methods or learning models for Hamiltonian systems and Poisson systems to preserve the intrinsic properties that and possess. So far the numerical techniques for Hamiltonian systems have been well developed [12, 17, 26], however, research on the Poisson systems is ongoing due to its complexity.

### Ii-C Coordinate changes and the Darboux–Lie theorem

The main idea in studying Poisson systems is to find the connection to Hamiltonian systems, which are easier to deal with. In fact, a Poisson system expressed in arbitrary new coordinates is again a Poisson system, hence we naturally tend to simplify a given Poisson structure as much as possible by coordinate transformation.

###### Theorem 1 (Darboux 1882, Lie 1888).

Suppose that the matrix defines a Poisson bracket and is of constant rank in a neighbourhood of . Then, there exist functions , , and satisfying

 {Pi,Pj}=0 {Pi,Qj}=−δij {Pi,Cl}=0 {Qi,Pj}=δij {Qi,Qj}=0 {Qi,Cl}=0 {Ck,Pj}=0 {Ck,Qj}=0 {Ck,Cl}=0

on a neighbourhood of , where equals to 1 if else 0. The gradients of are linearly independent, so that constitutes a local change of coordinates to canonical form.

###### Corollary 1 (Transformation to canonical form).

Let us denote the transformation of Theorem 1 by . With this change of coordinates, the Poisson system becomes

 ˙z=B0∇K(z)withB0=(J−1000),

where . Writing , this system becomes

 ˙p=−Kq(p,q,c),˙q=Kp(p,q,c),˙c=0.

Corollary 1 reveals the connection between Poisson systems and Hamiltonian systems via coordinate changes. In a forward problem, i.e., solving the Poisson system by numerical integration, transformations are available for many well-known systems to perform structure-preserving calculations [33, 17], but there does not exist a general method to search for the new coordinates of an arbitrary Poisson system, which is still an open research issue. However, the inverse problem, i.e., learning an unknown Poisson system based on data, is an easier task.

## Iii Learning theory for Poisson systems and trajectories of autonomous systems

Assume that there is a dataset () from an unknown autonomous dynamical system, that could be a Poisson system or not, satisfying for time step and phase flow . We aim to discover the dynamics using learning models, so that we can make predictions into future or perform some other computational tasks. To describe things clearly, we first give the definition of the extended symplectic map.

###### Definition 6.

A transformation (where is an open set in ) is called an extended symplectic map with latent dimension if it can be written as

 Φ(x1x2)=(ϕ(x1,x2)x2),x1∈R2d,x2∈Rn−2d,2d≤n,

where is differentiable, and is a symplectic map for each fixed . Note that degenerates to a general symplectic map when .

### Iii-a Poisson neural networks

We propose a high-level network architecture, i.e., the Poisson neural networks (PNNs), to learn Poisson systems or autonomous flows based on the Darboux–Lie theorem. Theorem 1 and Corollary 1 indicate that any Poisson system in -dimensional space can be transformed to a “piecewise” Hamiltonian system, where is the latent dimension determined by the rank of . The architecture is composed of three parts: (1) a transformation, (2) an extended symplectic map, (3) the inverse of the transformation, denoted by , and , respectively. For the construction of extended symplectic neural networks we refer the readers to Appendix C-A, which is an important part of this work. The transformations and can be implemented using two alternative approaches.
Primary architecture. We model as an invertible neural network, as in [8, 9], to automatically obtain its inverse . Then, we learn the data by optimizing the mean-squared-error loss

 L(T)=1n⋅NN∑i=1∥θ−1∘Φ∘θ(xi)−yi∥2,

where is an extended symplectic neural network with latent dimension .
Alternative architecture. We exploit an autoencoder to parameterize , with two different neural networks. Then, the loss is designed as

 L(T)=Ls(T)+λ⋅La(T)=12d⋅NN∑i=1∥Φ∘θ(xi)−θ(yi)∥2+λ⋅1n⋅NN∑i=1(∥θ−1∘θ(xi)−xi∥2+∥θ−1∘θ(yi)−yi∥2),

where is a symplectic neural network and

is a hyperparameter to be tuned. Note that in this case

is not intrinsically equivalent to the identity map. Basically, this architecture only learns the Poisson map limited on a submanifold embedded in the whole phase space.

In both cases we perform predictions by , which gives the th step. We prefer the invertible neural networks because the reconstruction loss will disappear compared to the autoencoder, and we also expect to impose further prior information on the transformation. For example, in Section IV-C, we adopt a volume-preserving network as the invertible neural network, the so-called volume-preserving Poisson neural network (VP-PNN), which achieves better generalization compared to the non-volume-preserving Poisson neural network (NVP-PNN), since the original Poisson system has a volume-preserving phase flow. More crucially, autoencoder-based PNNs are unable to learn data lying on the whole space, rather than a -dimensional submanifold, when . However, autoencoder-based PNNs can perform better in some situations, such as in the numerical case in Section IV-E. Intuitively, the alternative architecture outperforms the primary architecture when . An illustration of PNNs is presented in Fig. 1.

### Iii-B Learning Poisson systems

Consider the case that consists of data points from a Poisson system. Next, we present the approximation properties of PNNs.

###### Theorem 2.

Suppose that (i) the extended symplectic neural networks are universal approximators within the space of extended symplectic maps in topology, (ii) (primary architecture) the invertible neural networks are universal approximators within the space of invertible differentiable maps in the

topology, and (iii) (alternative architecture) the transformation neural networks

and are universal approximators within the space of continuous maps in the topology. Then, the corresponding Poisson neural networks are universal approximators within the space of Poisson maps in topology for the primary architecture, and are able to approximate arbitrary Poisson maps (limited on submanifolds) within the space of continuous maps in topology for the alternative architecture. The approximations are considered on compact sets.

###### Proof.

It can be deduced directly from Theorem 1 and Corollary 1. ∎

Notice that in the alternative architecture, the PNN itself is not intrinsically a Poisson map, however, by using for multistep prediction, it can also preserve geometric structure and enjoy stable long term performance in practice.

### Iii-C Learning trajectories of autonomous systems

Now consider the case that consists of a series of data points on a single trajectory (maybe not from a Poisson system), i.e, , where for time step and , which is the phase flow of an unknown autonomous system . Unlike the theory for learning Poisson systems, the use of PNNs to learn autonomous flows is quite novel. The theories are driven by the observation that symplectic neural networks can learn a trajectory from a non-Hamiltonian system, see Section IV-A for details. The next theorem reveals the internal mechanism.

###### Theorem 3.

Suppose that is a simply connected open set, and the periodic solution is from an autonomous dynamical system

 ˙y=f(y),y∈U,

then there exists a Hamiltonian , such that also satisfies the Hamiltonian system

 ˙y=J−1∇H(y),y∈U.
###### Proof.

The proof can be found in Appendix B-A. ∎

Based on above theorem, one may be able to apply symplectic neural networks to arbitrary periodic solution to autonomous system in . Naturally, we tend to explore similar results in high-dimensional space. We intuitively expect to transform any high-dimensional periodic solution to autonomous system into one lying on a plane with the help of coordinate changes, and subsequently, the original trajectory can be learned via PNN. In fact, this conjecture is almost right, except for the case when the orbit of the considered motion is a non-trivial 1-knot in . For the theory on knots we refer to [23, 2, 28], and we briefly present the basic concepts in Appendix A.

###### Theorem 4.

Suppose that is a contractible open set, periodic solution is from an autonomous dynamical system

 ˙y=f(y),y∈U,

and the orbit of is unknotted. Then, there exists a Hamiltonian and a satisfying Lemma 1 with rank of , such that also satisfies the Poisson system

 ˙y=B(y)∇H(y),y∈U.

Note that non-trivial -knots exist only in .

###### Proof.

The proof can be found in Appendix B-B. ∎

Up to now, we have shown that PNNs can be used to learn almost any periodic solution to autonomous systems, and the latent dimension is actually fixed as 2. The symplectic structure embedded in PNNs will endow the predictions with long term stability and more accuracy, for periodic solutions. Nevertheless, there is still a limitation of this method, as one can see, PNNs are allowed to learn only a single trajectory upon training. Basically, the limitation is inevitable as we have already got rid of most of the constraints on the vector field

, which is in fact a trade-off between data and systems. In spite of this fact, we still expect to further develop a strategy to relax the requirements of “single” and “periodic”, by increasing the latent dimension.

###### Conjecture 1.

Suppose that is a contractible open set, is a vector field, and is a smooth trivial -knot embedded in . If is a smooth tangent vector field on , then there exists a smooth single-valued function and a smooth matrix-valued function satisfying Lemma 1 with rank of , such that .

The conjecture provides a more general insight into the above theoretical results on single trajectory. If it holds, one may learn several trajectories lying on a higher-dimensional trivial knot simultaneously upon training, with higher latent dimension. Unfortunately, the proofs of 1-knot case cannot be easily extended to the general case, since the fact that a solenoidal vector field on is exactly a field of Hamiltonian system does not hold for higher-dimensional space. A more thorough explanation of this conjecture is needed in future works.

## Iv Simulation results

In this section, we present several simulation cases to verify our theoretical results, and indicate the potential application of PNNs in the field of computer vision. All the hyper-parameters for detailed architectures and training settings are shown in Appendix

C and Table III. For each detailed Poisson system involved, we obtain the ground truth and data using a high order symplectic integrator with its corresponding coordinate transformation, which is listed in Appendix D.

### Iv-a Lotka–Volterra equation

The Lotka–Volterra equation can be written as

 (˙u˙v)=(u(v−2)v(1−u))=(0uv−uv0)∇H(u,v), (6)

where . As a Poisson system, we are able to discover the underlying symplectic structure using PNNs. The data consist of three trajectories, starting at , respectively. We generate 100 training points with time step for each trajectory. Besides the PNN, we also use a SympNet to learn the three trajectories simultaneously, as well as learn the single trajectory starting at . We perform predictions for 1000 steps starting at the end points of the training trajectories, and the results of the three cases are presented in Fig. 2. As shown in the left figure, the PNN successfully learns the system and achieves a stable long time prediction, compared to the classical Runge-Kutta method of order four (RK45). Meanwhile, the SympNet [19] fails to fit the three trajectories simultaneously, since the data points are not from a Hamiltonian system, as shown in the middle figure. However, the right figure reveals that the SympNet is indeed able to learn a single trajectory, even though it is not from a Hamiltonian system, which is consistent with Theorem 3.

### Iv-B Extended pendulum system

We test the performance of a PNN on odd-dimensional Poisson systems. The motion of the pendulum system is governed by

 (˙p˙q)=(−sinqp)=(0−110)∇H(p,q),

where . This is a canonical Hamiltonian system, and we subsequently extend this system to three-dimensional space:

 ⎛⎜⎝˙p˙q˙c⎞⎟⎠=⎛⎜⎝−sinqp+c0⎞⎟⎠=⎛⎜⎝0−10100000⎞⎟⎠∇~H(p,q,c),

where . To make the data more difficult to learn, a nonlinear transformation is applied to the extended phase space. The governing equation for the transformed system is as follows:

 ⎛⎜⎝˙u˙v˙r⎞⎟⎠=⎛⎜⎝0−1−2v102u2v−2u0⎞⎟⎠⎛⎜⎝u−3u2−v2+rsinv−2uvu⎞⎟⎠=:B(u,v,r)∇K(u,v,r), (7)

where . One may readily verify that satisfies Lemma 1, therefore (7) is a Poisson system.

Three trajectories are simulated with initial conditions , , , and time step . We use data points obtained in the first 100 steps as our training set. Then we perform predictions for 1000 steps starting at the end points of training set; the results are shown in Fig. 3. From the left figure, it can be seen that the predictions made by the PNN match the ground truth perfectly, remaining on the true trajectories after long times. Meanwhile, the PNN is able to recover the underlying structure of the system, as shown on the right hand side. The trajectories of the system in the latent space are recovered as trajectories on parallel planes, which matches the fact that the trajectories are generated from several different two-dimensional symplectic submanifolds.

### Iv-C Charged particle in an electromagnetic potential

We consider the dynamics of the charged particle in an electromagnetic potential governed by the Lorentz force

 m¨x=q(E+˙x×B),

where is the mass, denotes the particle’s position, is the electric charge, denotes the magnetic field, and is the electric field with being the potentials. Let be the velocity of the charged particle, then the governing equations of the particle’s motion can be expressed as

 (˙v˙x)=⎛⎝−qm2^B(x)−1mI1mI0⎞⎠∇H(v,x),H(v,x)=12mvTv+qφ(x), (8)

where

 ^B(x)=⎛⎜⎝0−B3(x)B2(x)B3(x)0−B1(x)−B2(x)B1(x)0⎞⎟⎠

for . Here we test the dynamics with , , and

 A(x)=13√x21+x22⋅(−x2,x1,0),φ(x)=1100√x21+x22

for . Then

 B(x)=(∇×A)(x)=(0,0,√x21+x22),E(x)=−(∇φ)(x)=(x1,x2,0)100(x21+x22)32.

The initial state is chosen to be , , in which case the system degenerates into four-dimensional dynamics, i.e., the motion of the particle is always on a plane. For simplicity, we also denote them by , , and study the dimension-reduced system. We then generate a trajectory of 1500 training points followed by 300 test points with time step . Subsequently, a volume-preserving PNN (VP-PNN) is trained to learn the training set, and we perform predictions for 2000 steps starting at the end point of the training set, as shown in Fig. 4. It can be seen that the VP-PNN perfectly predicts the trajectory without deviation. Furthermore, we also train a non-volume-preserving PNN (NVP-PNN) and a volume-preserving neural network (VPNN) to compare with the above model. After sufficient training, the three models make predictions starting at the initial state to reconstruct trajectories, as shown in Fig. 5. As one can see, the VP-PNN performs slightly better than the NVP-PNN, while the NVP-PNN is much better than the VPNN. The quantitative results shown in Table I also support this observation. Although the VP-PNN has larger training MSE and one step test MSE than the NVP-PNN, its long time test MSE is instead less, which is not surprising because NVP-PNNs and VPNNs possess the prior information of symplectic structure and volume preservation respectively, while VP-PNNs has both of them. Note that the considered dimension-reduced system of (8) is source-free hence its phase flow is intrinsically volume-preserving on the four-dimensional space.

### Iv-D Nonlinear Schrödinger equation

We consider the nonlinear Schrödinger equation

 ⎧⎪⎨⎪⎩i∂w∂t+∂2w∂x2+2|w|2w=0,w(x,0)=w0(x),

where is a complex field and the boundary condition is periodic, i.e., for . An interesting space discretization of the nonlinear Schrödinger equation is the Ablowitz–Ladik model

 i˙wk+1Δx2(wk+1−2wk+wk−1)+|w2k|(wk+1+wk−1)=0

with , . Letting , we obtain

 ˙uk=−1Δx2(vk+1−2vk+vk−1)−(u2k+v2k)(vk+1+vk−1),˙vk=1Δx2(uk+1−2uk+uk−1)+(u2k+v2k)(uk+1+uk−1).

With , this system can be written as

 (˙u˙v)=(0−D(u,v)D(u,v)0)∇H(u,v) (9)

where is the diagonal matrix with entries , and the Hamiltonian is

 H(u,v)=1Δx2N∑l=1(ulul−1+vlvl−1)−1Δx4N∑l=1ln(1+Δx2(u2l+v2l)).

We thus get a Poisson system. In the experiment, we choose the boundary condition , and set , hence (9) is a Poisson system of dimension 40. We then generate 500 training points followed by 100 test points with time step . That means the solution to this equation during the time interval is treated as the training set, and then we learn the data using a PNN and predict the solution between . The result is shown in Fig. 6: both of the real part and imaginary part match the ground truth well.

### Iv-E Pixel observations of two-body problem

We consider the pixel observations of the two-body problem, which are the images of two balls in black background, as shown in Fig. 7. Time series of the images form a movie of the motion of two balls governed by gravitation. Here we intend to learn the phase flow on a coarse time grid while making predictions on a finer time grid, to forecast and smoothen the movie simultaneously. To achieve this goal, a simple recurrent training scheme is applied to our method. Similar treatments can be found in [1].

Suppose the training dataset is . We set our goal to be making predictions on at , . Denote the PNN to be trained as . Then we train

 fmPNN=θ−1∘Φm∘θ

to approximate the flow from to , and is set to 2 in this case. Since we are learning a single trajectory on a submanifold of the high-dimensional space, autoencoders are used to approximate . in the corresponding loss function is chosen to be 1. After training, is used to generate predictions for .

A single trajectory of the system is generated with time step , as shown in Fig. 7. The training dataset contains images of size . Let and denote the ground truth and predictions made by the PNN at grid points. Let and denote the ground truth and predictions made by the PNN on middle points of the grids. The test loss on grids is calculated as the mean squared error between and while the test loss on middle points is calculated as the mean squared error between and . We use a similar definition as in [35] to compute the valid prediction time . Suppose we are given the ground truth dataset and prediction , starting from . Let the root mean square error (RMSE) be

 E(~x)=√⟨(x−~x)2⟩,

where stands for spatial average. The valid prediction time (VPT) is defined to be

 Tϵ=argmaxtf{tf|E(~x(t))≤ϵ,∀t≤tf},

where is a hyperparameter to be chosen. Here we set .

Low error is obtained both on the grid points and the middle points, as shown in Table II, which indicates that PNNs can handle prediction and interpolation simultaneously. The VPT is much longer than the time scale of the training window, further suggesting that PNNs are good at long-time predictions and intrinsically structure-preserving. It can be seen in Fig. 7 that the prediction matches the ground truth perfectly even after .

Note that according to Theorem 4, the periodic solution to an autonomous dynamical system in can always be learned by PNNs when . Since this trajectory of two-body system is periodic in and the image at step is uniquely determined by the image at step , we can assume without loss of generality that the pixel observations of a two-body system form a periodic solution to an autonomous dynamical system, regardless of whether the internal mechanism is Hamiltonian.

## V Summary

The main contribution of this paper is to provide a novel high-level network architecture, PNN, to learn the phase flow of an arbitrary Poisson system. Since a single periodic solution to an autonomous system can be proven to be a solution to a Poisson system if the orbit is unknotted, PNNs can be directly applied to a much broader class of systems without modification. From this perspective, theoretical results regarding the approximation ability of PNNs are presented. Several simulations including the Lotka-Volterra equation, an extended pendulum system, charged particles in the electromagnetic potential, a nonlinear Schrödinger equation and a trajectory of the two-body problem support our theoretical findings and illustrate the advantages of PNNs on long time prediction and frame interpolation. Even though not explicitly mentioned in the paper, PNNs can be easily extended to learn phase flows from irregularly sampled data. Interested readers may refer to [19] for more details. PNNs can also learn the Hamiltonian systems on low dimensional submanifolds or constrained Hamiltonian systems, which can be expressed as a Poisson system on local coordinates [17, Chapter VII.1].

Despite the great expressivity, stability and interpretability of PNNs, an open issue is whether one can use it to infer and make predictions on multiple trajectories of an autonomous system without generalized Poisson structure under certain circumstances. We conjectured in the paper that the solution to an arbitrary autonomous system lying on a smooth trivial -knot matches the solution to a Poisson system. If this holds, the use of PNNs on multiple trajectories of a general autonomous system would be theoretically justified. We leave the proof or counterexamples of this conjecture as future work.

## Appendix A Introduction to knot

Consider the embeddings of in , . Two embeddings are equivalent if there is a homeomorphism of such that . An embedding is unknotted if it is equivalent to the trivial knot defined by the standard embedding

 k0:Sn→Sm; (x0,⋯,xn)→(x0,⋯,xn,0,⋯,0).

In fact, the embedding is always unknotted when . Therefore an -knot is defined as an embedding of in . For convenience, let us make a little change in the terminology: an -knot means an embedding of in for , since we are more concerned about the trivial knot in this work.

## Appendix B Proofs for theorems

### B-a Proof of Theorem 3

In two-dimensional space, consider the system

with boundary condition

 v|Γ=a,

where is the orbit of and is a vector field. [22, p. 60, Theorem 1] shows that there exists a solution to the system above given a suitable single-valued function and any continuous satisfying

 ∮Γa⋅nds=0,

where is the exterior normal with respect to the domain inside . Set , we have

 ∮Γf⋅nds=∮Γ0ds=0,

hence there exists a such that

 div v=0,v|Γ=f.

Note that is a solenoidal vector field, which leads to

 v=curl (−H)=(−∂H∂y2,∂H∂y1)T=J−1∇H.

The defined above is exactly the Hamiltonian we are looking for.

### B-B Proof of Theorem 4

Since is unknotted, there exist a periodic and a homeomorphism such that

which is exactly the coordinate transformation deforming into . Theorem 3 shows that is also a solution to a Hamiltonian system, hence the extended solution satisfies

 ⎧⎪⎨⎪⎩˙Γ=(J−1000)∇K(Γ),J∈R2×2,K(y1,y2,⋯,yn)=H(y1,y2), (10)

for a single-valued function . In fact, a Poisson system expressed in an arbitrary new coordinate immediately becomes a new Poisson system with a new whose rank is equivalent to the original one [17, p. 265]. Therefore, is also a solution to a Poisson system obtained by expressing system (10) in new coordinate via transformation . Furthermore, they share the same latent dimension of 2.

## Appendix C Implementation of architecture

### C-a Extended symplectic neural networks

We adopt SympNets [19] as the universal approximators for symplectic maps. The architecture of SympNets is based on three modules, i.e.,

• Linear modules.

 Ln(pq)=(I0/SnSn/0I)⋯(I0S2I)(IS10I)(pq)+b,p,q∈Rd,

where are symmetric, is the bias, while the unit upper triangular symplectic matrices and the unit lower triangular symplectic matrices appear alternately. In this module, (represented by in practice) and are parameters to learn. In fact, can represent any linear symplectic map [18].

• Activation modules.

 Nup(pq)=(p+~σa(q)q),Nlow(pq)=(p~σa(p)+q),p,q∈Rd,

where for . Here is the element-wise product, is the activation function, and is the parameter to learn.

 Gup(pq)=(p+^σK,a,b(q)q),Glow(pq)=(p^σK,a,b(p)+q),p,q∈Rd,

where for . Here , are the parameters to learn, and is a positive integer regarded as the width of the module.

The SympNets are the composition of above three modules. In particular, we use two classes of SympNets: the LA-SympNets composed of linear and activation modules, and the G-SympNets composed of gradient modules. Notice that both LA and G SympNets are universal approximators for symplectic maps as shown in [19]. For convenience, we clarify the terminology for describing a detailed LA(G)-SympNet: an LA-SympNet of layers with sublayers means it is the composition of linear modules and activation modules, where linear and activation modules appear alternatively like the architecture of fully-connected neural network, and each linear module is composed of alternated triangular symplectic matrices; a G-SympNet of layers with width means it is composed of alternated gradient modules, and the width is defined as above.

In this works we further develop the extended symplectic neural networks by extending the gradient modules.

• Extended modules.

 Eup⎛⎜⎝pqc⎞⎟⎠=⎛⎜⎝p+ˆσK1,K2,a,b(q,c)qc⎞⎟⎠,Elow⎛⎜⎝pqc⎞⎟⎠=⎛⎜⎝pˆσK1,K2,a,b(p.c)+qc⎞⎟⎠,p,q∈Rd, c∈Rn−2d,

where for , . Here , , are the parameters to learn, and is a positive integer regarded as the width of the module.

The extended symplectic neural networks (E-SympNets) are the composition of extended modules. As noticed, is the latent dimension of the E-SympNets. The terminology for describing the architecture of E-SympNets is the same as that for G-SympNets, hence we will not repeat here. It is worth mentioning that the approximation property of E-SympNets is still unknown, which is left as future work.

### C-B Invertible neural networks

In numerical experiments, we adopt the NICE [8] as volume-preserving invertible neural networks, and the real NVP [9] as general non-volume-preserving invertible neural networks.

• Volume-preserving modules.

 Vup(x1x2)=(x1+m1(x2)x2),Vlow(x1x