Deep Hamiltonian networks based on symplectic integrators

04/23/2020
by   Aiqing Zhu, et al.
0

HNets is a class of neural networks on grounds of physical prior for learning Hamiltonian systems. This paper explains the influences of different integrators as hyper-parameters on the HNets through error analysis. If we define the network target as the map with zero empirical loss on arbitrary training data, then the non-symplectic integrators cannot guarantee the existence of the network targets of HNets. We introduce the inverse modified equations for HNets and prove that the HNets based on symplectic integrators possess network targets and the differences between the network targets and the original Hamiltonians depend on the accuracy orders of the integrators. Our numerical experiments show that the phase flows of the Hamiltonian systems obtained by symplectic HNets do not exactly preserve the original Hamiltonians, but preserve the network targets calculated; the loss of the network target for the training data and the test data is much less than the loss of the original Hamiltonian; the symplectic HNets have more powerful generalization ability and higher accuracy than the non-symplectic HNets in addressing predicting issues. Thus, the symplectic integrators are of critical importance for HNets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/22/2021

Symplectic Learning for Hamiltonian Neural Networks

Machine learning methods are widely used in the natural sciences to mode...
08/05/2021

Symplectic integration of learned Hamiltonian systems

Hamiltonian systems are differential equations which describe systems in...
01/11/2020

Symplectic networks: Intrinsic structure-preserving networks for identifying Hamiltonian systems

This work presents a framework of constructing the neural networks prese...
06/16/2020

Time Discretizations of Wasserstein-Hamiltonian Flows

We study discretizations of Hamiltonian systems on the probability densi...
03/09/2021

Data-driven Prediction of General Hamiltonian Dynamics via Learning Exactly-Symplectic Maps

We consider the learning and prediction of nonlinear time series generat...
05/11/2020

Symplectic Neural Networks in Taylor Series Form for Hamiltonian Systems

We propose an effective and light-weighted learning algorithm, Symplecti...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Dynamical systems play a critical role in shaping our understanding of the physical world. And recent line of works bridged the connection between dynamical systems and deep neural networks. It is widely studied to analyze neural networks from the perspective of dynamic systems [11, 28, 37]

. And researchers make an effort to employ deep learning to dynamical systems recently

[25, 31, 33]. In particular, neural networks have been applied to solve differential equations [10, 27, 32, 42]. With the explosive growth of available data and computing resources, current papers focus on discovery sufficiently accurate models of dynamical systems directly from data.

A good physics model could predict changes in a system over time. In particular, our goal is discovery of dynamics systems on grounds of remarkable generalization ability of neural networks. In this task, multistep neural networks introduce a novel approach to nonlinear systems identification that combines the classical multi-step methods with deep neural networks [34]. ODENets based on general ODE solver, in contrast, propose using adjoint equation instead of back-propagating through ODE solver [7].

The problem with extant methods is that they tend not to learn conservation laws. This often causes them to drift away from the true dynamics of the system as errors accumulate [17]. Hamiltonian system is one of the expressions of classical mechanics and has been applied to a wide range of physics fields from celestial mechanics to quantum field theory [1, 35, 38]

, and there are also important applications for machine learning

[4, 23, 36, 39, 41]. Hamiltonian system is in the form

(1)

where , is the -by-identity matrix. The scalar function is called the Hamiltonian [2]. In order to learn Hamiltonian systems, [17] proposes the HNet to learn a parametric function for . [8] improves HNet for separable Hamiltonian as SRNN, and it numerically confirms that HNets based on symplectic integrators perform better than the ones based on non-symplectic integrators.

For the numerical solution of the Hamiltonian system, symplectic integrator has a unique and irreplaceable advantage, especially for the long-term tracking of the system and the conservation of invariant. Pioneering work on symplectic integration is due to Kang Feng [14], and this direction has been extensively studied and has achieved extremely fruitful results [15, 16, 13, 18, 21, 24, 40]. Symplectic integators solve the long-term calculation of dynamic systems, and have also been successfully applied in diverse fields of science and engineering [12, 26, 30, 43, 44].

Following are the definitions of symplectic map and symplectic integrator.

Definition 1.

A differentiable map (where is an open set) is called symplectic if

where is the Jacobian of .

In 1899, Poincare proved that the flow of the Hamiltonian system is a symplectic map [18, Chapter VI.2], i.e.,

where is the flow of (1) starting from by time .

Definition 2.

An integrator is called symplectic if the one-step map is symplectic whenever the integrator is applied to a smooth Hamiltonian system.

In this paper, symplectic Euler method and implicit midpoint rule are both symplectic, while explicit Euler method and implicit trapezoidal rule are both non-symplectic. More information about the symplectic integrator refers to [13].

The recent study has verified the importance of symplectic integrators in HNets by numerical experiments [8], but the theoretical understanding is still lagging behind. The core of this work is to build the backward error analysis of HNets. We introduce the target error and the network target. The target error is the difference between the network target and the true target, where the network target is defined as the map with zero empirical loss on arbitrary training data. In addition, the inverse modified equation is proposed to calculate the network target. It is proved that the HNets based on the symplectic integrators possess network targets while the non-symplectic integrators cannot guarantee the existence of the network targets. We also perform the experiments to confirm the theoretical results later.

The paper is organized as follows. Section 2 introduces the concepts of the target error and the network target, furthermore, proposes the inverse modified equations for backward analysis. Section 3 presents the numerical results of the target error and the prediction of the phase flows of the Hamiltonian systems on grounds of HNets. Some conclusions will be given in the last section.

2 Network targets and inverse modified equations

2.1 Target error

The neural networks, as universal approximators [3, 9, 19], can approximate essentially any function. First we show the definition of the target error.

Definition 3.

The network target (NT) is the map with zero empirical loss on arbitrary training data. The true target (TT) is the map expected to be approached. The difference between them is called the target error (TE).

To approximate the function

, the loss function is generally defined as

(2)

where is the training dataset. It is clear that in this case the network target and the true target are both , i.e., the network is an approximation of but not other functions. However, the network target is not the same as the true target for some networks with priors, and there is even no network target.

Multistep neural network (MNN) [34], and Hamiltonian neural network (HNet) [17]

, are two examples of non-zero target errors. In consideration of the ordinary differential equation

(3)

where . MNN, whose true target is , proceeds by applying a linear multistep method to

and obtain the loss function. For instance, the loss function of MNN based on explicit Euler method is

where is the exact flow of equation (3) and is the training data. And the network target satisfies

where

is a vector-valued function whose higher-order derivatives are tensors. Thus the target error of MNN based on explicit Euler method can be expressed as

Figure 1: Illustration of expected error. is the function by training a neural network, is the neural network whose loss is at a global minimum, is the function closest to in the hypothesis space, is the network target and is the true target. The expected error consists of four parts, of which optimization error, generalization error and approximation error are the main objects of classic neural network error analysis, and the final target error is usually zero so that it is often ignored. When is different from , sufficient training, a great quantity of data and large network size can effectively reduce the errors of the first three, consequently the target error will become the main part of the expected error.

The expected error mainly depends on optimization error, generalization error and approximation error, while the target error is usually zero so that it is often ignored, as shown in Fig. 1. There have been numerous studies that analyze the optimization, generalization and approximation errors [6, 5, 9, 19, 20, 22, 29], but the target error is lagging behind. When neural networks are used to learn dynamic systems, sufficient data, developed optimization techniques as well as powerful approximation capabilities, make the target error a major part of expected error. That is what we should focus on.

The non-symplectic integrators cannot guarantee the existence of the network targets of HNets. For instance, if the chosen numerical integrator is explicit Euler method, the loss function is

with exact flow and training data , then the network target is subject to

However, not every vector-valued function is the gradient of another scalar function, that means the network target may not exist. As shown in Fig. 1, the absence of network targets makes classic error analysis no longer applicable. This work will prove the existence of network targets of HNets based on the symplectic integrators.

2.2 Inverse modified equation

Consider an ordinary differential equation

(4)

and a numerical integrator which produces the numerical approximations as

The idea of modified differential equation is to search for a equation of the form

such that . In contrast, now we are aiming to search for an inverse modified differential equation of the form

(5)

such that for and the exact solution of (4). Consequently, the inverse modified differential equation is indeed the network target of the multi-step network.

For the computation of (5), we expand the solution of (4) into a Taylor series with respect to time step :

(6)

Moreover, assume that the numerical integrator can be expanded as

(7)

where the functions are given and typically composed of and its derivatives. In order to achieve , it should be satisfied that . Now plugging (5) into (7), and we can easily obtain the expressions of in (5) by comparing like powers of in (6) and (7).

Example 1.

The implicit midpoint rule

could be expanded as

Comparing like powers of in the expression (6) and the above yields recurrence relations for functions , namely,

We only do formal analysis without taking care of convergence issues in this work.

Theorem 1.

Suppose that the integrator is of order , more precisely,

where denotes the exact flow of , and is the leading term of the local truncation. The inverse modified equation satisfies

where .

Proof.

Inserting (5) and (6) into it and comparing the coefficient of the first power of yields . Thus . Furthermore, comparing like powers of yields and . ∎

The above theorem shows that the high-order integrator can effectively reduce the target error. The network target of HNet is the Hamiltonian of the inverse modified equation, nevertheless, the non-symplectic integrators cannot guarantee the inverse modified equation being a Hamiltonian system. And we point out that the inverse modified equation based on the symplectic integrator is still a Hamiltonian system.

Theorem 2.

If a symplectic integrator is applied to a Hamiltonian system with a smooth Hamiltonian , then the inverse modified equation (5) is also a Hamiltonian system. More precisely, there exist smooth functions , , such that

Proof.

According to Theorem 1, . Assume that for , we need to prove the existence of satisfying .

Consider the truncated inverse modified equation

which has the Hamiltonian by induction. Its numerical flow satisfies

And

where and are symplectic maps, and . Therefore

Consequently, , in other words, is symmetric. And the existence of satisfying

follows from the Integrability Lemma [18, Lemma VI.2.7]. ∎

3 Numerical results

3.1 Target error

In this subsection, we check the target error of HNet. For HNet based on symplectic integrator, let be the trained network, be the true target and be the network target. Then

where depends on the performance of the trained network, and the target error becomes the main factor of the expected error.

The mathematical pendulum (mass , massless rod of length , gravitational acceleration ) is a system having the Hamiltonian

and the differential equation is of the form

The training data of HNet is , where are randomly generated from compact set and is the exact flow, . The test data is generated in the same way. The chosen integrator is the symplectic Euler method

which is of order 1. Compute the truncations of the inverse modified equation of order 1 and 2, denoted as

The loss function of HNet is

where .

Training loss Test loss
Table 1: The training loss and test loss of HNet and three different Hamiltonians.
Figure 2: Pendulum. (A) Three flows of the original pendulum system, the learned HNet and the 1-order modified system respectively. The HNet correctly captures the flow of the modified system rather than the original pendulum system. (B) Conservation of the original Hamiltonian of pendulum compares to the two corresponding truncated Hamiltonians of the inverse modified equation ( and ). The HNet nearly conserves the Hamiltonian of the modified system rather than the original pendulum system.

Let be the trained HNet. The training loss and test loss of , the original Hamiltonian , and the truncated inverse modified equation , are given in Table 1. The loss of is much more larger than others, and the loss of modified Hamiltonian markedly decreases with the increasing of the truncation order. Fig. 2 presents three phase flows starting at for , and also show the conservation of the three Hamiltonians. The above results show that the network target is indeed the calculated Hamiltonian of the inverse modified equation rather than the original Hamiltonian.

3.2 Symplectic HNets

We call the HNet based on symplectic (non-symplectic) integrator as symplectic (non-symplectic) HNet. In this subsection, we will confirm that the symplectic HNets have better generalization ability and higher accuracy than the non-symplectic HNets in addressing predicting issues. In experiments, we use a series of phase points with time step as the training data, i.e., subject to . Here symplectic HNets choose the implicit midpoint rule [18, Chapter II.1]

while non-symplectic HNets choose the implicit trapezoidal rule [18, Chapter II.1]

Note that both of them are of order 2.

3.2.1 Pendulum

Figure 3: Pendulum. Comparison between the predicted flows of the symplectic HNet and the non-symplectic HNet. (A) shows the flow obtained by symplectic HNet, which discovers the unknown trajectory successfully. (B) shows the flow obtained by non-symplectic HNet, which deviates from the true trajectory.
Figure 4: Pendulum. (A, B) Positions obtained by the symplectic HNet and the non-symplectic HNet. (C, D) Global error and conservation of Hamiltonian for HNets. Symplectic HNet gives comparatively accurate result.

For pendulum, we obtain the flow starting from with 40 points and time step , as the training data, i.e., , . As shown in Fig. 3, 4, the symplectic HNet reproduces the phase flow more accurately, which has lower global error and more accurate conservation of Hamiltonian.

3.2.2 Kepler problem

Figure 5: Kepler problem. Comparison between the predicted flows of the symplectic HNet and the non-symplectic HNet. (A) shows the flow obtained by symplectic HNet, which discovers the unknown trajectory successfully. (B) shows the flow obtained by non-symplectic HNet, which deviates from the true trajectory over time.
Figure 6: Kepler problem. (A, B) Positions obtained by the symplectic HNet and the non-symplectic HNet. Both HNets reproduce the phase portrait while symplectic HNets more accurately over time. (C, D) Global error and conservation of Hamiltonian for HNets. Symplectic HNet gives comparatively accurate result.

Now we consider a four-dimensional system, the Kepler problem (mass , , gravitational constant = 1), which has the Hamiltonian

We obtain the flow starting from with 55 points and time step , as the training data, i.e., , . As shown in Fig. 5, 6, the symplectic HNet reproduces the phase flow and captures the dynamic more accurately, which has lower global error and more accurate conservation of Hamiltonian.

4 Conclusion

This work explains the influences of different integrators as hyper-parameters on the HNets through error analysis. The target error is introduced to describe the gap between the network target and the true target, and the inverse modified equation is proposed to calculate the network target. The target error depends on the accuracy order of the integrator. Theoretical analysis shows that the HNets based on symplectic integrators possess network targets while non-symplectic integrators cannot guarantee the existence of the network targets. Numerical results have confirmed our theoretical analysis. HNets based on the symplectic integrators are learning the network targets rather than the Hamiltonian of the original system. In addressing predicting issues, symplectic HNets have better generalization ability and higher accuracy.

References

  • [1] V. I. Arnold, V. V. Kozlov, and A. I. Neishtadt (2007) Mathematical aspects of classical and celestial mechanics. Vol. 3, Springer Science & Business Media. Cited by: §1.
  • [2] V. I. Arnold (2013) Mathematical methods of classical mechanics. Vol. 60, Springer Science & Business Media. Cited by: §1.
  • [3] A. R. Barron (1993)

    Universal approximation bounds for superpositions of a sigmoidal function

    .
    IEEE Transactions on Information theory 39 (3), pp. 930–945. Cited by: §2.1.
  • [4] T. Bertalan, F. Dietrich, I. Mezić, and I. G. Kevrekidis (2019) On learning hamiltonian systems from data. Chaos: An Interdisciplinary Journal of Nonlinear Science 29 (12), pp. 121107. Cited by: §1.
  • [5] L. Bottou and O. Bousquet (2008) The tradeoffs of large scale learning. In Advances in neural information processing systems, pp. 161–168. Cited by: §2.1.
  • [6] L. Bottou (2010)

    Large-scale machine learning with stochastic gradient descent

    .
    In Proceedings of COMPSTAT’2010, pp. 177–186. Cited by: §2.1.
  • [7] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018) Neural ordinary differential equations. In Advances in neural information processing systems, pp. 6571–6583. Cited by: §1.
  • [8] Z. Chen, J. Zhang, M. Arjovsky, and L. Bottou (2019)

    Symplectic recurrent neural networks

    .
    arXiv preprint arXiv:1909.13334. Cited by: §1, §1.
  • [9] G. Cybenko (1989) Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2 (4), pp. 303–314. Cited by: §2.1, §2.1.
  • [10] W. E and B. Yu (2018) The deep ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics 6 (1), pp. 1–12. Cited by: §1.
  • [11] W. E (2017) A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics 5 (1), pp. 1–11. Cited by: §1.
  • [12] E. Faou, V. Gradinaru, and C. Lubich (2009) Computing semiclassical quantum dynamics with hagedorn wavepackets. SIAM Journal on Scientific Computing 31 (4), pp. 3027–3041. Cited by: §1.
  • [13] K. Feng and M. Qin (2010) Symplectic geometric algorithms for hamiltonian systems. Springer. Cited by: §1, §1.
  • [14] K. Feng (1984) On difference schemes and symplectic geometry. In Proceedings of the 5th international symposium on differential geometry and differential equations, Cited by: §1.
  • [15] K. Feng (1986) Difference schemes for hamiltonian formalism and symplectic geometry. Journal of Computational Mathematics 4 (3), pp. 279–289. Cited by: §1.
  • [16] K. Feng (1995) Collected works of feng kang: ii. National Defense Industry Press. Cited by: §1.
  • [17] S. Greydanus, M. Dzamba, and J. Yosinski (2019) Hamiltonian neural networks. In Advances in Neural Information Processing Systems, pp. 15353–15363. Cited by: §1, §2.1.
  • [18] E. Hairer, C. Lubich, and G. Wanner (2006) Geometric numerical integration: structure-preserving algorithms for ordinary differential equations. Vol. 31, Springer Science & Business Media. Cited by: §1, §1, §2.2, §3.2.
  • [19] K. Hornik, M. Stinchcombe, H. White, et al. (1989) Multilayer feedforward networks are universal approximators.. Neural networks 2 (5), pp. 359–366. Cited by: §2.1, §2.1.
  • [20] P. Jin, L. Lu, Y. Tang, and G. E. Karniadakis (2019) Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness. arXiv preprint arXiv:1905.11427. Cited by: §2.1.
  • [21] O. Koch and C. Lubich (2007) Dynamical low-rank approximation. SIAM Journal on Matrix Analysis and Applications 29 (2), pp. 434–454. Cited by: §1.
  • [22] J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht (2016) Gradient descent converges to minimizers. arXiv preprint arXiv:1602.04915. Cited by: §2.1.
  • [23] S. Li, C. Dong, L. Zhang, and Wang,Lei (2019) Neural canonical transformation with symplectic flows. arXiv preprint arXiv:1910.00024. Cited by: §1.
  • [24] C. Lubich (2008) From quantum to classical molecular dynamics: reduced models and numerical analysis. European Mathematical Society. Cited by: §1.
  • [25] M. Lutter, C. Ritter, and J. Peters (2019) Deep lagrangian networks: using physics as model prior for deep learning. arXiv preprint arXiv:1907.04490. Cited by: §1.
  • [26] I. Omelyan, I. Mryglod, and R. Folk (2003) Symplectic analytically integrable decomposition algorithms: classification, derivation, and application to molecular dynamics, quantum and celestial mechanics simulations. Computer Physics Communications 151 (3), pp. 272–314. Cited by: §1.
  • [27] G. Pang, L. Lu, and G. E. Karniadakis (2019) Fpinns: fractional physics-informed neural networks. SIAM Journal on Scientific Computing 41 (4), pp. A2603–A2626. Cited by: §1.
  • [28] R. Pascanu, T. Mikolov, and Y. Bengio (2013) On the difficulty of training recurrent neural networks. In International conference on machine learning, pp. 1310–1318. Cited by: §1.
  • [29] T. Poggio and Q. Liao (2017) Theory ii: landscape of the empirical risk in deep learning. Ph.D. Thesis, Center for Brains, Minds and Machines (CBMM), arXiv. Cited by: §2.1.
  • [30] H. Qin, J. Liu, J. Xiao, R. Zhang, Y. He, Y. Wang, Y. Sun, J. W. Burby, L. Ellison, and Y. Zhou (2015) Canonical symplectic particle-in-cell method for long-term large-scale simulations of the vlasov–maxwell equations. Nuclear Fusion 56 (1), pp. 014001. Cited by: §1.
  • [31] M. Raissi and G. E. Karniadakis (2018)

    Hidden physics models: machine learning of nonlinear partial differential equations

    .
    Journal of Computational Physics 357, pp. 125–141. Cited by: §1.
  • [32] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.
  • [33] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2017) Inferring solutions of differential equations using noisy multi-fidelity data. Journal of Computational Physics 335, pp. 736–746. Cited by: §1.
  • [34] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2018) Multistep neural networks for data-driven discovery of nonlinear dynamical systems. arXiv preprint arXiv:1801.01236. Cited by: §1, §2.1.
  • [35] L. E. Reichl (1999) A modern course in statistical physics. American Association of Physics Teachers. Cited by: §1.
  • [36] D. J. Rezende, S. Racanière, I. Higgins, and P. Toth (2019) Equivariant hamiltonian flows. arXiv preprint arXiv:1909.13739. Cited by: §1.
  • [37] L. Ruthotto and E. Haber (2019) Deep neural networks motivated by partial differential equations. Journal of Mathematical Imaging and Vision, pp. 1–13. Cited by: §1.
  • [38] J. J. Sakurai and E. D. Commins (1995) Modern quantum mechanics, revised edition. American Association of Physics Teachers. Cited by: §1.
  • [39] A. Sanchez-Gonzalez, V. Bapst, K. Cranmer, and P. Battaglia (2019) Hamiltonian graph networks with ode integrators. arXiv preprint arXiv:1909.12790. Cited by: §1.
  • [40] J. Sanz-Serna and M. Calvo (2018) Numerical hamiltonian problems. Courier Dover Publications. Cited by: §1.
  • [41] P. Toth, D. J. Rezende, A. Jaegle, S. Racanière, A. Botev, and I. Higgins (2019) Hamiltonian generative networks. arXiv preprint arXiv:1909.13789. Cited by: §1.
  • [42] D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis (2019) Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems. Journal of Computational Physics 397, pp. 108850. Cited by: §1.
  • [43] R. Zhang, J. Liu, Y. Tang, H. Qin, J. Xiao, and B. Zhu (2014) Canonicalization and symplectic simulation of the gyrocenter dynamics in time-independent magnetic fields. Physics of Plasmas 21 (3), pp. 032504. Cited by: §1.
  • [44] B. Zhu, R. Zhang, Y. Tang, X. Tu, and Y. Zhao (2016) Splitting k-symplectic methods for non-canonical separable hamiltonian problems. Journal of Computational Physics 322, pp. 387–399. Cited by: §1.