1 Introduction
Recently researchers make an effort to employ deep learning to the field of physics, such as [28], which proposes the physicsinformed neural networks to solve the problems referring to various differential equations. As [28]
mainly studies the nonlinear partial differential equations,
[23, 31] extend it to the stochastic equations and the fractional equations. [4]adopts the variational form of PDEs and minimizes the corresponding energy functional. Factually, there are more and more works studying the applications of machine learning on complex physical systems
[10, 24, 27, 29]. At the same time, there are also articles discussing the understanding of neural network from the point of view of dynamic system [5, 25]. Deep neural networks can be considered as discrete dynamical systems, the basic dynamics at each step being a linear transformation followed by a componentwise nonlinear (activation) function. We can approximate the continuous dynamical systems based on neural network, which is in fact our main topic in this work.
It is well known that deep neural networks can approximate continuous maps [2, 12]. However, the universal approximation theorems only guarantee a small approximation error for a sufficiently large network, but do not consider the optimization and generalization errors. In order to obtain satisfactory accuracy for a given task, it requires lots of data that we cannot afford if this task is regarded as a pure approximation problem [14]. For this reason, when applying deep learning to the physical systems, the cost of data acquisition is prohibitive, and we are inevitably faced with the challenge of drawing conclusions and making decisions under partial information. Fortunately, there exists a vast amount of prior knowledge that is currently not being utilized in modern machine learning practice. Encoding such structured information into a learning algorithm results in amplifying the information content of the data that the algorithm sees, enabling it to quickly steer itself towards the right solution and generalize well even when only a few training examples are available. There have been many researches talking about how to employ the prior knowledge to construct the targeted machine learning algorithms for specific problems, where the approximated maps usually have special structures or properties which we naturally expect the trained networks possess, such as image classification [16]
[19], game playing [30], as well as the recent work [17] providing a special network structure for approximating nonlinear operators.Turning to the main issue in this work, we are trying to explore how to impose the special structure on the neural networks for solving the corresponding problems in dynamical systems. The simplest structures that one can imagine are the Hamiltonian structure or the gradient flow structure, and this viewpoint is mentioned in the proposal on machine learning via dynamical systems [5]. Actually we will give the answer that how to construct a neural network intrinsically preserving the Hamiltonian structure, which is the core of this paper.
Denote the byidentity matrix by , let
which is an orthogonal, skewsymmetric real matrix, so that
.Definition 1.
A matrix is called symplectic if .
With the concept of symplectic matrix, we can give the definition of symplectic map.
Definition 2.
A differentiable map (where is an open set) is called symplectic if the Jacobian matrix is everywhere symplectic, i.e.,
In consideration of the Hamiltonian system
(1) 
where , and is the Hamiltonian function representing the energy of the system (1). Let be the phase flow of system (1). In 1899, Poincare proved that the phase flow of the Hamiltonian system is a symplectic map [11, p. 184, Theorem 2.4], i.e.,
(2) 
The behavior of dynamical systems at large times is a notoriously difficult problem in mathematics, particularly for discrete dynamical systems. One may encounter situations where the dynamics explodes, converges to stationary states or exhibits chaotic behavior. Fortunately, for the Hamiltonian system, these problems can be alleviated by imposing the symplectic structure on the numerical methods due to (2). There are some welldeveloped works on symplectic integration, see for examples [7, 8, 11, 18]. As the symplectic numerical integrators yield transformative results across diverse applications based on the Hamiltonian systems [6, 22, 26, 32], we have to consider the construction of the symplectic networks and how it impacts on the numerical methods for the Hamiltonian systems.
The remaining parts of this paper are organized as follows. Section 2 briefly summarizes the main problem we will solve. The detailed process of constructing the symplectic networks and some related results are shown in Section 3. In Section 4, we give the unit triangular factorization of the matrix symplectic group, which is necessary to the construction of the symplectic networks. Section 5 presents the numerical results of solving and predicting the phase flows of the Hamiltonian systems. Some conclusions will be given in the last section.
2 Problem setup
In this work we focus on the Hamiltonian system (1). Since the equation (2) holds, it is natural to search for numerical methods that share this property. Different from the conventional symplectic numerical integrators, we will use deep neural networks instead, to approximate the phase flow for fixed . Crucially, we expect the networks to be intrinsically symplectic, by imposing special structure on it. The main task we are aiming to complete is just to design the symplectic networks, subsequently it can be applied to solving or predicting the phase flows of the Hamiltonian systems accurately and efficiently.
3 Neural networks for identifying phase flow of Hamiltonian system
The conventional numerical method for solving Hamiltonian system is to construct a special scheme that preserves the symplectic structure which is the intrinsic nature of the phase flow of Hamiltonian system, and the designed scheme is subsequently used to calculate the phase points step by step. The difficulty of above method is that a symplectic scheme is often an implicit scheme, which may be computationally expensive, especially for highorder ones. Here we exploit the deep learning algorithm leading to an explicit expression, to obtain the numerical phase flow of Hamiltonian system. Furthermore, a new targeted network structure is proposed to this problem.
3.1 Networks applying structure information
There have been some works studying how to apply deep learning to solving differential equations, i.e., physicsinformed neural networks. Instead of using the general physicsinformed networks to approximate the solution of (1) over the whole time interval, here we learn the phase flow of (1) to calculate the solutions. Same to what numerical integrators do, the trained network is used to compute the phase point after time step of the start point , i.e., the input is phase point while the output is the phase point .
Assume that the phase flows of (1) referred to are constrained in a compact space . We first choose many finite phase points from , denoted by , then obtain the phase points of next time step of pointwise by a highorder symplectic scheme, such as symplectic Runge–Kutta method [11, Chapter VI.4]. Naturally,
(3) 
is viewed as the training set for learning, where . The neural network as numerical integrator can be learned by minimizing the mean squared error loss
(4) 
where
is the loss with respect to the data points, and
(5) 
is the loss to preserve the geometrical structure of Hamiltonian system, i.e., symplectic structure. Note that is the weight to set.
Suppose that there exist a small and such that
where , and is bounded due to [13]. Then for any , there holds
which implies that
The above discussion shows that even though the value of is set to 0, the will tend to 0 when tends to 0, and therefore to control the structure information, the may be not required. If we enforce a small to improve the error, the weight should be set to a large value, however, which will severely slow down the convergence. In experiments, it is expensive to obtain a satisfactory optimized network with large , thus we just set the to 0 in the section of numerical results.
3.2 Symplectic networks (SympNets)
The previous general networks preserve the symplectic structure by controlling the loss (maybe encoding symplecticity condition as (5)) on data points, however, the weakness is obvious that the trained networks are hard to keep structure over the whole phase space , especially when the sampled points are sparse in a highdimensional . For the purpose of making our trained networks hold symplectic structure more extensively and intrinsically, we tend to bring the prior information into the network structure like what deep learning researchers prefer to do.
Suppose that and are two symplectic maps, then it is easy to show that their composite map
is also symplectic due to the chain rule. Thus the symplectic maps are closed under the composite operation. Aim to construct the destination symplectic map, we make an effort to find the simplest linear/nonlinear symplectic map as the unit layer of the network, which is similar to the general fullyconnected neural network. Hence the network yielded will stay symplectic due to the closure of composite operation. It is noteworthy that the unit symplectic map should be simple enough so that it can be freely parameterized, since the now available optimization techniques in deep learning focus on the unconstrained problems.
In reference to the linear symplectic unit, let
(6) 
where is symmetric. Obviously, and are linear and symplectic, however, they are too simple to express a general linear symplectic map. In order to strengthen the expressivity of linear layer, we compound several and alternately as
(7) 
hence is viewed as the linear unit in our symplectic network. For the sake of convenience, we can replace with to eliminate the constraint on the symmetry in practice. Now a essential problem is raised naturally, that, are maps like able to represent any linear symplectic map? The answer is yes and we will show details in Section 4.
Nevertheless, it is unclear that how to search for a concise nonlinear expression which satisfies symplecticity condition and can be computed efficiently simultaneously. Here is a theorem providing a method about constructing symplectic maps [11, Chapter VI.5].
Theorem 1.
For any smooth function ,
locally define a symplectic map if is invertible.
Proof.
The proof can be found in [11, p. 197] ∎
Now we are able to generate symplectic maps by Theorem 1. Think about the smooth activation function , such as sigmoid, and let
where is the antiderivative of and is the th component of . We subsequently derive that
is a symplectic map, where . For notational convenience, we denote this by
here and are viewed as identity and zero maps, respectively. Thus is obtained as the nonlinear symplectic layer. Similar to (6), we define
With linear and nonlinear units and
, we construct the multilayer feedforward neural network with symplectic structure:
Note that and appear alternately, and in different layers have different parameters to be optimized. Furthermore, taking into account the fact that, for a small time step the numerical integrator should be close to the identity map, it is necessary to modify to be consistent with the behavior of tiny . We define
and
consequently
satisfies in the sense of map. Here is regarded as a layers symplectic network (SympNet) with sublayers.
For the same training set defined in (3), the loss of neural network is defined by
This way embeds the geometrical information in its network structure rather than the loss, compared to (4). What is much stronger than the previous losscontrolled structurepreserving network is that symplectic network keeps symplectic accurately everywhere, even there is no data point. Therefore symplectic networks are possible to predict the phase flows in unknown area corresponding to training data. We will show some numerical results in Section 5.
3.3 Extension of SympNets
The SympNets described in the previous subsection will be used in our numerical experiments, nevertheless, some setups on the nets are unnecessary to preserve the structure or guarantee the expressivity, such as the alternation of and . Here we extend the aforementioned symplectic networks to a more general form.
Denote
and
where is a fixed activation function. In fact, Section 4 points out that is equivalent to the set consisting of all the linear symplectic maps. Based on and , we have the following definition.
Definition 3.
For , , let
is called the general symplectic network. Futhermore, we define the collection of general symplectic networks as
Now we do not require the linear units to start with either the nonlinear units to appear alternatively. Based on this setup, the set surprisingly forms a group in the sense of compound.
Theorem 2 (Algebraic Structure).
The collection of the general symplectic networks is a group.
Proof.
We have shown that is the identity map which belongs to . Moreover, the associative law and the closure obviously hold by the definition of . What we need to confirm is there exists an inverse element for any , i.e., . According to the observation that
(8) 
as well as
(9) 
we derive that
Therefore is a group. ∎
In Section 3.2, a detailed structure of SympNets, i.e., , is provided for identifying the Hamiltonian systems. Now what we expect is that, to go a step further, becomes a symmetric integrator, which satisfies . By the extension of SympNets, we can construct the targeted integrator under the as , then
In practice, we may firstly obtain the expression of due to (8) and (9), then train on the data and finally go for the evaluation. It is noteworthy that and share the same parameters during training. The study of the effectiveness of this technique and how it impacts on the accuracy will be left to the future, in this paper we only apply the structure shown in Section 3.2.
4 Parametrization of the matrix symplectic group
Since the current optimization in deep learning focuses on the unconstrained problems, it is necessary to find a representation for symplectic matrix which can be freely parameterized. [1, 3, 9, 20, 21] provide several methods for the parametrization of the matrix symplectic group, nevertheless, all of its representation requires permutation matrices and nonsingular matrices which are hard to be applied to neural networks, here we are searching for a more effective way. In consideration of that unit triangular symplectic matrices
can be parameterized as
where is a matrix without any constraint, we are tending to decompose a general symplectic matrix into several simple symplectic matrices of above types.
Denote
where the unit upper triangular symplectic matrices and the unit lower triangular symplectic matrices appear alternately. And it is clear that for all integers . Now the main theorem is given as following.
Theorem 3 (Unit Triangular Factorization).
.
Proof.
The proof can be found in our previous work [15]. ∎
In [15], we systematically present several existing modern factorizations of the matrix symplecic group, and propose the unit triangular factorization described as Theorem 3. This factorization induces the unconstrained parametrization of the matrix symplectic group, by replacing the with . It enables us to make use of the symplectic matrix as a module in a deep neural network, just like what we are doing. Furthermore, [15] provides more unconstrained parametrization for the structured subsets of the matrix symplectic group, such as the positive definite symplectic matrices and the singular symplectic matrices, which may be applied to the problems with these constraints.
5 Numerical results
In this section, we check the above two types of networks in: (1) solving Hamiltonian system by learning abundant data over phase space, (2) predicting phase flow given a series of phase points depending on time.
5.1 Solving systems
To solve the systems by learning phase flow , we need to sample lots of data points from phase space to provide sufficient information about how acts on the whole space. After learning, we apply the trained network to generating phase flow step by step, subsequently to compare with the true flow. We will use the two types of networks: the general fullyconnected networks(FNNs) and the proposed symplectic networks(SympNets). Table 1 shows the default hyperparameters we set.
Type  Structure  Activation  Optimizer  Learning Rate  Epochs 

FNN  Sigmoid  Adam  0.1/0.01  1e6  
SympNet  8 layers with 5 sublayers  Sigmoid  Adam  0.1/0.01  1e6 
5.1.1 Pendulum
Here we solve the mathematical pendulum (mass , massless rod of length , gravitational acceleration
) which is a system with one degree of freedom having the Hamiltonian
We set , and generate training/test data points randomly from compact set . After training, we use the trained network to compute three flows starting at , , for steps, respectively.
The training MSE and the test MSE are shown in Table 2. Figure 1 shows the numerical flows solved by FNN and SympNet, and because of the structure information on SympNet, the flows by SympNet keep the true trajectories all the time while the flows by FNN deviate from the true trajectories over time. The trajectories of are given in Figure 2. We can see that SympNet provides a more reliable result than FNN in long time solution. Furthermore, SympNet preserves the energy nearly, hence keeps the peak/valley values of invariant.
Type  Pendulum  LotkaVolterra  
Training MSE  Test MSE  Training MSE  Test MSE  
FNN  3.14e7  3.22e7  5.94e7  6.23e7 
SympNet  2.88e7  2.74e7  1.97e5  1.82e5 
5.1.2 LotkaVolterra
Here we solve the LotkaVolterra system in logarithmic scale which is a system with one degree of freedom having the canonically Hamiltonian
We set , and generate training/test data points randomly from compact set . After training, we use the trained network to compute three flows starting at , , for steps, respectively.
The training MSE and the test MSE are shown in Table 2. Figure 3 shows the numerical flows solved by FNN and SympNet, and it reflects the superiority of SympNet again, which is similar to the case of pendulum, i.e., the SympNet preserves the geometrical structure of numerical phase flows. However, from Table 2 we know that the training MSE of SympNet is much more larger than of FNN in this case, which indicates that SympNets are not easy to train. It is worth noting that the SympNet used has only 63 parameters while FNN has thousands, moreover, 8 activation layers increase the difficulty of learning due to the disappearance of gradient. Since the wide difference in training MSE, it is meaningless to compare the trajectories of by FNN and SympNet. How to enhance the approximation capability of SympNets is till a problem.
5.2 Predicting systems
Now we consider a new task that conventional numerical methods are unable to achieve. For an unknown Hamiltonian system, i.e., the Hamiltonian is unknown, we try to predict the flow based on gathering a series of phase points with time step . Then is viewed as the training data. We apply neural networks to learning and giving the predicted phase flow starting at the rear . Table 1 shows the default hyperparameters we set.
5.2.1 Pendulum
For pendulum system which has the Hamiltonian
we first obtain the flow starting from with 40 points and time step as the training data, i.e., where , as well as
After training on , we use the trained network to compute the flow starting at for 1000 steps.
Figure 4 shows the predicted flows by FNN and SympNet. We find that SympNet discovers the unknown trajectory successfully while FNN completely fail to do that. Figure 5 provides the detailed trajectories of by FNN and SympNet, where SympNet gives comparatively accurate numerical result in the previous cycles, while values of by FNN stay invariant during predicting.
5.2.2 LotkaVolterra
For LotkaVolterra system which has the Hamiltonian
we first obtain the flow starting from with 25 points and time step as the training data, i.e., where , as well as
After training on , we use the trained network to compute the flow starting at for 1000 steps.
5.2.3 Kepler problem
Now we consider a fourdimensional system, the Kepler problem(mass , , gravitational constant=1), which has the Hamiltonian
we first obtain the flow starting from with 40 points and time step as the training data, i.e., where , as well as
After training on , we use the trained network to compute the flow starting at for 1000 steps.
In this case, we only study the behavior of SympNet on predicting, since the previous cases have pointed out the ineffectiveness of FNN. Figure 8 shows the numerical flows predicted by SympNet, where both the trajectories of velocity and position are almost consistent with the true trajectories. Figure 9 provides the detailed trajectories of and . Different from solving, the task for predicting requires only little data points, thus the scale of training set needed will remain acceptable even in highdimensional problems.
6 Conclusions
This work presents a framework of constructing the neural networks preserving the symplectic structure. The key of the construction is the unit triangular factorization of the matrix symplectic group, which has been proposed in our previous work [15]. Furthermore, the general symplectic networks shown in Definition 3 are provided with an algebraic structure, which form a group in fact. This algebraic structure indicates the possibility of building the more efficient symplectic networks, such as the symmetric symplectic networks.
With the symplectic networks, we show some numerical results about solving the Hamiltonian systems by learning abundant data points over the phase space, and predicting the phase flows by learning a series of points depending on time. All the experiments point out that the symplectic networks perform much more better than the fullyconnected networks that are without any prior information, especially in the task of predicting which is unable to do within the conventional numerical methods.
Acknowledgments
This research is supported by the National Natural Science Foundation of China (Grant No. 11771438), and Major Project on New Generation of Artificial Intelligence from MOST of China (Grant No. 2018AAA0101002).
References
 [1] (1986) Matrix factorizations for symplectic qrlike methods. Linear Algebra and its Applications 83, pp. 49–77. Cited by: §4.

[2]
(1989)
Approximation by superpositions of a sigmoidal function
. Mathematics of control, signals and systems 2 (4), pp. 303–314. Cited by: §1.  [3] (2009) Parametrization of the matrix symplectic group and applications. SIAM Journal on Matrix Analysis and Applications 31 (2), pp. 650–673. Cited by: §4.
 [4] (2018) The deep ritz method: a deep learningbased numerical algorithm for solving variational problems. Communications in Mathematics and Statistics 6 (1), pp. 1–12. Cited by: §1.
 [5] (2017) A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics 5 (1), pp. 1–11. Cited by: §1, §1.
 [6] (2009) Computing semiclassical quantum dynamics with hagedorn wavepackets. SIAM Journal on Scientific Computing 31 (4), pp. 3027–3041. Cited by: §1.
 [7] (1984) On difference schemes and symplectic geometry. In Proceedings of the 5th international symposium on differential geometry and differential equations, Cited by: §1.
 [8] (1995) Collected works of feng kang (ii). National Defense Industry Press. Cited by: §1.

[9]
(1991)
An analysis of structure preserving methods for symplectic eigenvalue problems, rairo automat
. Prod. Inform. Ind 25, pp. f1991. Cited by: §4.  [10] (2019) Machine learning of spacefractional differential equations. SIAM Journal on Scientific Computing 41 (4), pp. A2485–A2509. Cited by: §1.

[11]
(2006)
Geometric numerical integration: structurepreserving algorithms for ordinary differential equations
. Vol. 31, Springer Science & Business Media. Cited by: §1, §3.1, §3.2, §3.2.  [12] (1989) Multilayer feedforward networks are universal approximators. Neural networks 2 (5), pp. 359–366. Cited by: §1.
 [13] (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural networks 3 (5), pp. 551–560. Cited by: §3.1.
 [14] (2019) Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness. arXiv preprint arXiv:1905.11427. Cited by: §1.
 [15] (2019) Unit triangular factorization of the matrix symplectic group. arXiv preprint arXiv:1912.10926. Cited by: §4, §4, §6.
 [16] (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
 [17] (2019) DeepONet: learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193. Cited by: §1.
 [18] (2008) From quantum to classical molecular dynamics: reduced models and numerical analysis. European Mathematical Society. Cited by: §1.
 [19] (2013) Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, Vol. 30, pp. 3. Cited by: §1.
 [20] (2003) On the determinant of symplectic matrices. Manchester Centre for Computational Mathematics. Cited by: §4.
 [21] (1988) A symplectic orthogonal method for single input or single output discrete time optimal quadratic control problems. SIAM Journal on Matrix Analysis and Applications 9 (2), pp. 221–247. Cited by: §4.
 [22] (2003) Symplectic analytically integrable decomposition algorithms: classification, derivation, and application to molecular dynamics, quantum and celestial mechanics simulations. Computer Physics Communications 151 (3), pp. 272–314. Cited by: §1.
 [23] (2019) Fpinns: fractional physicsinformed neural networks. SIAM Journal on Scientific Computing 41 (4), pp. A2603–A2626. Cited by: §1.
 [24] (2019) Neuralnetinduced gaussian process regression for function approximation and pde solution. Journal of Computational Physics 384, pp. 270–288. Cited by: §1.

[25]
(2013)
On the difficulty of training recurrent neural networks
. In International conference on machine learning, pp. 1310–1318. Cited by: §1.  [26] (2015) Canonical symplectic particleincell method for longterm largescale simulations of the vlasov–maxwell equations. Nuclear Fusion 56 (1), pp. 014001. Cited by: §1.
 [27] (2018) Hidden physics models: machine learning of nonlinear partial differential equations. Journal of Computational Physics 357, pp. 125–141. Cited by: §1.
 [28] (2019) Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.
 [29] (2017) Inferring solutions of differential equations using noisy multifidelity data. Journal of Computational Physics 335, pp. 736–746. Cited by: §1.
 [30] (2016) Mastering the game of go with deep neural networks and tree search. nature 529 (7587), pp. 484. Cited by: §1.
 [31] (2019) Quantifying total uncertainty in physicsinformed neural networks for solving forward and inverse stochastic problems. Journal of Computational Physics 397, pp. 108850. Cited by: §1.
 [32] (2014) Canonicalization and symplectic simulation of the gyrocenter dynamics in timeindependent magnetic fields. Physics of Plasmas 21 (3), pp. 032504. Cited by: §1.
Comments
There are no comments yet.