Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control

09/26/2019 ∙ by Yaofeng Desmond Zhong, et al. ∙ Siemens AG Princeton University 0

In this paper, we introduce Symplectic ODE-Net (SymODEN), a deep learning framework which can infer the dynamics of a physical system from observed state trajectories. To achieve better generalization with fewer training samples, SymODEN incorporates appropriate inductive bias by designing the associated computation graph in a physics-informed manner. In particular, we enforce Hamiltonian dynamics with control to learn the underlying dynamics in a transparent way which can then be leveraged to draw insight about relevant physical aspects of the system, such as mass and potential energy. In addition, we propose a parametrization which can enforce this Hamiltonian formalism even when the generalized coordinate data is embedded in a high-dimensional space or we can only access velocity data instead of generalized momentum. This framework, by offering interpretable, physically-consistent models for physical systems, opens up new possibilities for synthesizing model-based control strategies.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the recent years, deep neural networks

(Goodfellow et al., 2016) have become very accurate and widely-used in many application domains, such as image recognition (He et al., 2016), language comprehension (Devlin et al., 2019), and sequential decision making (Silver et al., 2017). To learn underlying patterns from data and enable generalization beyond the training set, the learning approach incorporates appropriate inductive bias (Haussler, 1988; Baxter, 2000) by promoting representations which are simple in some sense. It typically manifests itself via a set of assumptions which in turn can guide a learning algorithm to pick one hypothesis over another. The success in predicting an outcome for previously unseen data then depends on how well the inductive bias captures the ground reality. Inductive bias can be introduced as the prior in a Bayesian model, or via the choice of computation graphs in a neural network.

In a variety of settings, especially in physical systems, wherein laws of physics are primarily responsible for shaping the outcome, generalization in neural networks can be improved by leveraging underlying physics for designing the computation graphs. Here, by leveraging a generalization of the Hamiltonian dynamics, we develop a learning framework which captures the underlying physics in the associated computation graph. Our results show that incorporation of such physics-based inductive bias can provide knowledge about relevant physical properties (mass, potential energy) and laws (conservation of energy) of the system. These insights, in turn, enable more accurate prediction of future behavior and improvements in out-of-sample behavior. Furthermore, learning a physically-consistent model of the underlying dynamics can subsequently enable usage of model-based controllers which can provide performance guarantees for complex, nonlinear systems. In particular, insight about kinetic and potential energy of a physical system can be leveraged to design appropriate control strategies, such as the method of controlled Lagrangian (Bloch et al., 2001) and interconnection & damping assignment (Ortega et al., 2002) , which can reshape the closed-loop energy landscape to achieve a broad range of control objectives (regulation, tracking, etc.).

Related Work

Physics-based Priors for Learning in Dynamical Systems:

The last few years have witnessed a significant interest in incorporating physics-based priors into deep learning frameworks. Such approaches, in contrast to more rigid parametric system identification techniques (Söderström and Stoica, 1988), use neural networks to approximate the state-transition dynamics and therefore are more expressive. Sanchez-Gonzalez et al. (2018), by representing the causal relationships in a physical system as a directed graph, use a recurrent graph network to infer latent space dynamics of robotic systems. Lutter et al. (2019) and Gupta et al. (2019) leverage Lagrangian mechanics to learn dynamics of kinematic structures from time-series data of position, velocity and acceleration. A more recent (concurrent) work by Greydanus et al. (2019) uses Hamiltonian mechanics to learn dynamics of autonomous, energy-conserved mechanical systems from time-series data of position, momentum and their derivatives. A key difference between these approaches and the proposed one is that our framework does not require any information about higher order derivatives (e.g. acceleration) and can incorporate external control into the Hamiltonian formalism.

Neural Networks for Dynamics and Control

Inferring underlying dynamics from time-series data plays a critical role towards controlling closed-loop response of dynamical systems, such as robotic manipulators (Lillicrap et al., 2015) and building HVAC systems (Wei et al., 2017). Although use of neural networks towards identification and control of dynamical systems dates back to more than three decades (Narendra and Parthasarathy, 1990), recent advances in deep neural networks have led to renewed interest in this domain. Watter et al. (2015) learns dynamics with control from high-dimensional observations (raw image sequences) using a variational approach and designs an iterative LQR controller to control physical systems by imposing a locally linear constraint. Karl et al. (2016) and Krishnan et al. (2017) adopt a variational approach and use recurrent architectures to learn state-space models from noisy observation. SE3-Nets (Byravan and Fox, 2017) learn transformation of rigid bodies from point cloud data. Ayed et al. (2019) use partial information about the system state to learn a nonlinear state-space model. However, this body of work, while attempting to learn state-space models, does not take physics-based priors into consideration.

Contribution

The main contribution of this work is two-fold. First, we introduce a learning framework called Symplectic ODE-Net (SymODEN) which encodes a generalization of the Hamiltonian dynamics. This generalization, by adding an external control term to the standard Hamiltonian dynamics, allows us to learn the system dynamics which conforms to Hamiltonian dynamics with control. With the learnt structured dynamics, we are able to design controllers to control the system to track a reference point. Moreover, by encoding the structure, we can achieve better predictions with smaller network sizes. Second, we take one step forward in combining the physics-based prior and the data-driven approach. Previous approaches (Lutter et al., 2019; Greydanus et al., 2019) require data in the form of generalized coordinates and their derivatives up to the second order. However, a large number of physical systems accomodates generalized coordinates which are non-Euclidean (e.g. angles), and such angle data is often obtained in the embedded form, i.e., instead of the coordinate () itself. The underlying reason is that an angular coordinate lies on instead of

. In contrast to previous approaches which do not address this aspect, SymODEN has been designed to work with angle data in the embedded form. Additionally, we leverage differentiable ODE solvers to avoid the need for estimating second-order derivatives of generalized coordinates.

2 Preliminary Concepts

2.1 Hamiltonian Dynamics

Lagrangian dynamics and Hamiltonian dynamics are both reformulation of Newtonian dynamics and they provide new insights into the laws of mechanics. In these formulations, the configuration of a system is described by generalized coordinates . With time, the configuration point of the system moves in the configuration space, tracing out a trajectory. Lagrangian dynamics describe the evolution of this trajectory, i.e. the equations of motion, in the configuration space. Hamiltonian dynamics, however, track the change of system states in the phase space – consisting of generalized coordinates and generalized momenta . In other words, Hamiltonian dynamics treats and on a equal footing and this leads to not only symmetric equations of motion but a whole new approach to classical mechanics as well (Goldstein et al., 2002). Beyond classical mechanics, the Hamiltonian dynamics is also widely used in statistical and quantum mechanics.

In Hamiltonian dynamics, the time-evolution of a system is described by the Hamiltonian , a scalar function of generalized coordinates and momenta. Moreover, in almost all physical systems, the Hamiltonian is same as the total energy and hence can be expressed as

(1)

where the mass matrix is positive definite and represents potential energy of the system. Correspondingly, the time-evolution of the system is governed by

(2)

where we have dropped explicit dependence on and for brevity of notation. Moreover, since

(3)

the total energy is conserved along a trajectory of the system. The RHS of Equation (2) is called the symplectic gradient (Rowe et al., 1980) of . Equation (3) shows moving along the symplectic gradient keeps the Hamiltonian constant.

In this work, we consider a generalization of the Hamiltonian dynamics which provides a means to incorporate external control (), such as force and torque. As external control is usually affine and influences the change of generalized momenta, we consider the following dynamics

(4)

When , the generalized dynamics reduce to the classical Hamiltonian dynamics (2) and the total energy is conserved; however, when , the system has a dissipation-free energy exchange with the environment.

2.2 Control via Energy Shaping

Once the dynamics of a system have been learned, it can be used to synthesize a controller to maneuver the system to a reference configuration . As the proposed approach offers insight about the energy associated with a system, it is a natural choice to exploit this information for designing controllers via energy shaping (Ortega et al., 2001). As energy is a fundamental aspect of physical systems, reshaping the associated energy landscape enables us to specify a broad range of control objectives and design nonlinear controllers with provable performance guarantees.

If , the system is fully-actuated and we have control over any dimension of “acceleration” in . For such class of systems, a controller can be designed via potential energy shaping and damping injection . We restate the procedure from Ortega et al. (2001) using our notation for completeness. As the name suggests, the goal of potential energy shaping is to design such that the closed-loop system behaves as if its time-evolution is governed by a desired Hamiltonian , i.e.

(5)

where the desired Hamiltonian differs from the original Hamiltonian by the potential energy

(6)

In other words, shape the potential energy such that the desired Hamiltonian has a minimum at . Then, by substituting (1) and (6) into (5), we get

(7)

With potential energy shaping, we ensure that the system has the lowest energy at the desired reference point and in general the system would oscillate around this point. To ensure that the trajectory actually converge to this point, we add some damping 111if we have access to instead of , we use instead in Equation (8)

(8)

Remark If the desired potential energy is chosen to be a quadratic of the form

(9)

the external forcing term can be expressed as

(10)

This is the familiar PD controller with an additional energy compensation term.

However, for under-actuated systems, potential energy shaping alone is not sufficient to maneuver the system to a desired configuration. Kinetic energy shaping (Chang et al., 2002) is also needed to design the controller.

3 Symplectic ODE-Net

In this section, we introduce the network architecture of Symplectic ODE-Net. In Subsection 3.1

, we show how to learn an ordinary differential equation with a constant control term. In Subsection

3.2, we assume we have access to generalized coordinate and momentum data and derive the network architecture. In Subsection 3.3, we take one step further to propose a data-driven approach to deal with data of embedded angle coordinates. In Subsection 3.4, we put together the line of reasoning introduced in the previous two subsections to propose SymODEN for learning dynamics on the hybrid space .

3.1 Neural ODE with Constant Forcing

Now we focus on the problem of learning the ordinary differential equation (ODE) from time series data. Consider an ODE: . Assume we don’t know the analytical expression of the right hand side (RHS) and we approximate it with a neural network. If we have time series data , how could we learn from the data?

Chen et al. (2018)

introduced Neural ODE, differentiable ODE solvers with O(1)-memory backpropagation. With Neural ODE, we make predictions by approximating the RHS function using a neural network

and put it into a ODE solver

We can then construct the loss function

and update the weights by backpropagating through the ODE solver.

In theory, we can learn in this way. In practice, however, the neural net is hard to train if is large. If we have a bad initial estimate of the , the prediction error would in general be large. Although might be small, would be far from as error accumulates, which makes the neural network hard to train. In fact, the prediction error of is not as important as . In other words, we should weight data points in a short time horizon more than the rest of the data points. In order to address this and better utilize the data, we introduce the time horizon

as a hyperparameter and predict

from initial condition , where .

One challenge of leveraging Neural ODE to learn state-space models is how to learn the dynamics with the control term. Equation 4 has the form with . A function like this cannot be put into Neural ODE directly. In general, if our data consists of trajectories of and remains the same in a trajectory. We can approximate the augmented dynamics

(11)

Here, the input and output of have the same dimension, which can be put into Neural ODE. The problem is then how to design the network architecture of , or equivalently such that we can learn the dynamics in an efficient way.

3.2 Learning from Generalized Coordinate and Momentum

Suppose we have trajectory data consisting of , where remains constant in a trajectory. If we have the prior knowledge that the unforced dynamics of and is governed by Hamiltonian dynamics, we can use three neural nets – , and – as function approximators to represent the inverse of mass matrix, potential energy and the control coefficient. Thus,

(12)

where

(13)

The partial derivative in the expression can be taken care of by automatic differentiation. by putting the designed into Neural ODE, we obtain a systematic way of adding the prior knowledge of Hamiltonian dynamics into end-to-end learning.

3.3 Learning from Embedded Angle Data

In the previous subsection, we assume . In a lot of physical system models, the state variables involve angles which reside in the interval . In other words, each angle resides on the manifold . From a data-driven perspective, the data that respects the geometry is a 2 dimensional embedding . Furthermore, the generalized momentum data is usually not available. Instead, the velocity is often available. For example, in OpenAI Gym (Brockman et al., 2016) Pendulum-v0 task, the observation is .

From a theoretic perspective, however, the angle itself instead of the 2D embedding is often used. The reason is that both the Lagrangian and Hamiltonian formulation are derived using generalized coordinates. Using a set of independent generalized coordinate makes the solution of the equations of motion easier.

In this subsection, we take the data-driven standpoint. We assume all the generalized coordinates are angles and the data comes in the form of . We aim to put our theoretical prior – Hamiltonian dynamics – into the data-driven approach. The goal is to learn the dynamics of , and . Noticing , we can write down the derivative of , and ,

(14)

where “” represents the elementwise product (Hadamard product). We assume and evolves with the generalized Hamiltonian dynamics Equation 4. Here the Hamiltonian is a function of , and instead of and .

(15)
(16)

Then the right hand side of Equation (3.3) can be expressed as a function of state variables and control . Thus, it can be put into the Neural ODE. We use three neural nets – , and – as function approximators. Substitute Equation (15) and (16) into (3.3), then the RHS serves as . 222In Equation (17), the derivative of

can be expanded using chain rule and expressed as a function of the states.

(17)

where

(18)
(19)

3.4 Learning on Hybrid Spaces

In Subsection 3.2, we treated the generalized coordinates as translational coordinate. In Subsection 3.3, we developed a method to better deal with embedded angle data. In most of physical systems, these two types of coordinates coexist. For example, robotics systems are usually modelled as interconnected rigid bodies. The positions of joints or center of mass are translational coordinates and the orientations of each rigid body are angular coordinates. In other words, the generalized coordinates lie on , where denotes the -torus, with and . In this subsection, we put together the architecture of the previous two subsections. We assume the generalized coordinates are and the data comes in the form of . With similar line of reasoning, we use three neural nets – , and – as function approximators. We have

(20)
(21)

with Hamiltonian dynamics, we have

(22)
(23)

Then

(24)

where the and come from Equation (22). Now we obtain a which can be put into Neural ODE. Figure 1 shows the flow of the computation graph based on Equation (20)-(24).

Figure 1: The computation graph of SymODEN. Blue arrows indicate neural network parametrization. Red arrows indicate automatic differentiation. For a given , the computation graph outputs a which follows Hamiltonian dynamics with control. The function itself is an input to the Neural ODE to generate estimation of states at each time step. Since all the operations are differentiable, weights of the neural networks can be updated by backpropagation.

3.5 Positive Definiteness of Mass matrix

In real physical systems, the mass matrix is positive definite, which ensures a positive kinetic energy with a non-zero velocity. The positive definiteness of implies the positive definiteness of . Thus, we impose this constraint in the network architecture by , where is a lower-triangular matrix. The positive definiteness is ensured if the diagonal elements of is positive. In practice, this can be done by adding a small constant to the diagonal elements of . It not only makes invertible, but also stabilize the training.

4 Experiments

4.1 Experimental Setup

We evaluate our model on four tasks: Task 1: a pendulum with generalized coordinate and momentum data (learning on ); Task 2: a pendulum with embedded angle data (learning on ); Task 3: a cart-pole system (learning on ) and Task 4: an acrobot (learning on ).

Model Variants. Besides the Symplectic ODE-Net model derived above, we consider a variant by approximating the Hamiltonian using a fully connected neural net . We call it Unstructured Symplectic ODE-Net (Unstructured SymODEN) since here we are not exploiting the structure of the Hamiltonian.

Baseline Models. In order to show that we can learn the dynamics better with less parameters by leveraging prior knowledge, we set up baseline models for all four experiments. For the pendulum with generalized coordinate and momentum data, the naive baseline model approximate Equation (12) – – by a fully connected neural net. For all the other experiments, which involves embedded angle data, we set up two different baseline models: naive baseline approximate by a fully connected neural net. It doesn’t respect the fact that the coordinate pair, and , lie on . Thus, we set up the geometric baseline model which approximate and with a fully connected neural net. This ensures that the angle data evolves on . 333For more information on model details, please refer to Appendix A.

Data Generation. For all tasks, we randomly generated initial conditions of states and combine them with 5 values of constant control, i.e., to construct the initial conditions of simulation. The initial conditions are then put into simulators to integrate for 20 time steps to generate trajectory data. These trajectory data serve as training set. The simulators for different tasks are different. For Task 1, we integrate the true generalized Hamiltonian dynamics with a time interval of 0.05 seconds to generate trajectories . All the other tasks deal with embedded angle data and velocity directly so we leverage Open AI Gym (Brockman et al., 2016) simulators to generate trajectory data. One caveat of using Open AI Gym is that not all environments use the Runge-Kutta method (RK4) for simulation. Gym favors other numerical schemes over RK4 because of speed, but it is harder to learn the dynamics with inaccurate data. For example, if we plot the total energy as a function of time from data generated by Pendulum-v0 environment with zero action, we see that the total energy oscillates around a constant by a significant amount, even though the total energy should be conserved. Thus, for Task 1 and Task 2, we leverage Pendulum-v0 and CartPole-v1 and replace the numerical integrator of the environments to RK4. For Task 3, we leverage the Acrobot-v1 environment which is already using RK4. We also change the action space of Pendulum-v0, CartPole-v1 and Acrobot-v1 to a continuous space with a large enough bound.

Model training. In all the tasks, we train our model using Adam optimizer (Kingma and Ba (2014)

) with 1000 epochs. We set a time horizon

, and choose “RK4” as the numerical integration scheme in Neural ODE. We vary the size of training set by doubling from 16 state initial conditions to 1024 state initial conditions. Each state initial condition is combined with five constant control to construct initial condition for simulation. Each trajectory is generated by putting the initial condition into the simulator and integrate 20 time steps forward. We set the size of mini-batches to be the number of state initial conditions. We logged the training error per trajectory and the prediction error per trajectory in each case for all the tasks. The training loss per trajectory is the mean squared error (MSE) between the estimation and the ground truth of 20 time steps. To evaluate the performance of each model in terms of long time prediction, we construct the metric of prediction error per trajectory by using the same state initial condition in the training set with a constant control of , integrating 40 time steps forward, and calculating the MSE of 40 time steps The reason of using only the unforced trajectories is that a constant nonzero control might cause the velocity to keep increasing or decreasing over time and large absolute values of velocity are of little interest in designing controller.

4.2 Results

Figure 2: Training error per trajectory and prediction error per trajectory for all 4 tasks with different number of training trajectories. Horizontal axis shows number of state initial condition of 16, 32, 64, 128, 256, 512, 1024 in the training set. Both the horizontal axis and vertical axis are in log scale.
Figure 3: Mean square error and total energy of test trajectories. SymODEN works the best in terms of both MSE and total energy. SymODEN predicts trajectories that match the ground truth since it has learnt the Hamiltonian and discovered the conservation from data. The ground truth of energy in all four tasks stay constant.

Figure 2 shows the variation in training error and prediction error with changes in the number of state initial conditions in the training set. We can see that SymODEN yields better generalization in all the tasks. In Task 3, although the Geometric Baseline Model beats the other ones in terms of training error, SymODEN generates more accurate predictions, indicating overfitting in the Geometric Baseline Model. By incorporating the physics-based prior of Hamiltonian dynamics, SymODEN learns dynamics that obey physical law and thus performs better in prediction. In most cases, SymODEN trained with less training data beats other models with more training data in terms of training error and prediction error, indicating that better generalization can be achieved with fewer training samples.

Figure 3 shows how the MSE and the total energy evolves along a trajectory with a previously unseen initial condition. For all the tasks, the MSE of the baseline models diverge faster than SymODEN. The Unstructured SymODEN works well in Task 1, Task 2 and Task 4 but not so well in Task 3. As for the total energy, in the two pendulum tasks, SymODEN and Unstructured SymODEN conserve total energy by oscillating around a constant value. In these models, the Hamiltonian itself is learnt and the prediction of the future states stay around a level set of the Hamiltonian. Baseline models, however, fail to find the conservation and the estimation of future states drift away from the initial Hamiltonian level set.

4.3 Task 1: Pendulum with Generalized Coordinate and Momentum Data

In this task, the dynamics has the following form

(25)

with Hamiltonian . In other words , and .

Figure 4: Sample trajectories and learnt functions of Task 1.

In Figure 4, The ground truth is an unforced trajectory which is energy-conserved. The prediction trajectory of the baseline model does not conserve energy while both the SymODEN and its unstructured variant predict energy-conserved trajectories. For SymODEN, the learnt and matches the ground truth well. differs from the ground truth with a constant. This is acceptable since the potential energy is a relative notion. Only the derivative of plays a role in the dynamics.

In this task, we are treating as a variable in and our training set contains initial condition of . The learnt functions do not extrapolate well outside this range, as we can see from the left part in the figures of and . We address this issue by working directly with embedded angle data, which lead to the next subsection.

4.4 Task 2: Pendulum with Embedded Data

Figure 5: Without true generalized momentum data, the learnt functions match the ground truth with a scaling. Here

The dynamics of this task are the same as Equation (25) but the training data are generated by the OpenAI Gym simulator. Here we do not have access to the true generalized momentum data, and the learnt function matches the ground truth with a scaling , as shown in Figure 5. To explain the scaling, let us look at the following dynamics

(26)

with Hamiltonian . If we only look at the dynamics of , we have , which is independent of . If we don’t have access to the generalized momentum , our trained neural network may converge to a Hamiltonian with a which is different from the true value, , in this task. By a scaling , the learnt functions match the ground truth. Even we are not learning the true , we can still perform prediction and control since we are learning the dynamics of correctly. We let , then the desired Hamiltonian has minimum energy when the pendulum rests at the upward position. For the damping injection, we let . Then from Equation (7) and (8), the controller we design is

Figure 6: Time-evolution of the state variables when the closed-loop control input is governed by Equation (27).
(27)

Only SymODEN out of all models we consider provides the learnt potential energy which is required to construct the controller. Figure 6 shows how the states evolve when the controller is fed into the OpenAI Gym simulator. We can successfully control the pendulum into the inverted position using the controller based on learnt model even though the absolute maximum control , 7.5, is more than three times larger than the absolute maximum in the training set, which is 2.0. This shows SymODEN extrapolates well.

4.5 Task 3: CartPole System

The CartPole system is an underactuated system and to design a controller to balance the pole from arbitrary initial condition requires trajectory optimization or kinetic energy shaping. Here we following the setup in the OpenAI Gym CartPole-v1 environment: “In CartPole-v1, the pendulum starts upright and the goal is to prevent it from falling over. The episode ends when the pole is more than 15 degrees from vertical or the cart moves more than 2.4 units from the center” (6). Since the initial condition is close to the goal, after learning the dynamics, we are able to design a PD controller based on the learnt dynamics and feed the controller back to the OpenAI Gym simulator.

(28)

Figure 7 shows the results of control with and . In 8 seconds, the pole remains within 15 degrees from vertical and cart remains within 0.3 units from the center.

Figure 7: Time-evolution of the state variables when the closed-loop control input is governed by Equation (28).

5 Conclusion

Here we have introduced Symplectic ODE-Net which provides a systematic way to incorporate the prior knowledge of Hamiltonian dynamics with control into a deep learning framework. We show that SymODEN achieves better extrapolation with fewer training samples by learning an interpretable, physically-consistent state-space model. In future works, a broader class of physics-based prior such as port-Hamiltonian system can be introduced to model a larger class of physical systems. SymODEN can work with embedded angle data or when we only have access to velocity instead of generalized momentum. Future works would explore, other types of embedding, such as embedded 3D orientations.

References

  • I. Ayed, E. de Bézenac, A. Pajot, J. Brajard, and P. Gallinari (2019) Learning dynamical systems from partial observations. arXiv:1902.11136. Cited by: §1.
  • J. Baxter (2000) A model of inductive bias learning.

    Journal of Artificial Intelligence Research

    12, pp. 149–198.
    Cited by: §1.
  • A. M. Bloch, N. E. Leonard, and J. E. Marsden (2001) Controlled lagrangians and the stabilization of euler–poincaré mechanical systems. International Journal of Robust and Nonlinear Control 11 (3), pp. 191–214. Cited by: §1.
  • G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba (2016) OpenAI Gym. arXiv:1606.01540. Cited by: §3.3, §4.1.
  • A. Byravan and D. Fox (2017) Se3-nets: learning rigid body motion using deep neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 173–180. Cited by: §1.
  • [6] CartPole-v1. Note: https://gym.openai.com/envs/CartPole-v1/Accessed: 2019-09-24 Cited by: §4.5.
  • D. E. Chang, A. M. Bloch, N. E. Leonard, J. E. Marsden, and C. A. Woolsey (2002) The equivalence of controlled lagrangian and controlled hamiltonian systems. ESAIM: Control, Optimisation and Calculus of Variations 8, pp. 393–422. Cited by: §2.2.
  • T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018) Neural ordinary differential equations. In Advances in Neural Information Processing Systems 31, pp. 6571–6583. Cited by: §3.1.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Cited by: §1.
  • H. Goldstein, C. Poole, and J. Safko (2002) Classical mechanics. AAPT. Cited by: §2.1.
  • I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio (2016) Deep learning. Vol. 1, MIT Press. Cited by: §1.
  • S. Greydanus, M. Dzamba, and J. Yosinski (2019) Hamiltonian Neural Networks. arXiv:1906.01563. Cited by: §1, §1.
  • J. K. Gupta, K. Menda, Z. Manchester, and M. J. Kochenderfer (2019) A general framework for structured learning of mechanical systems. arXiv:1902.08705. Cited by: §1.
  • D. Haussler (1988) Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence 36 (2), pp. 177–221. Cited by: §1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    pp. 770–778. Cited by: §1.
  • M. Karl, M. Soelch, J. Bayer, and P. van der Smagt (2016)

    Deep variational bayes filters: unsupervised learning of state space models from raw data

    .
    arXiv:1605.06432. Cited by: §1.
  • D. P. Kingma and J. Ba (2014) Adam: A Method for Stochastic Optimization. arXiv:1412.6980. Cited by: §4.1.
  • R. G. Krishnan, U. Shalit, and D. Sontag (2017) Structured inference networks for nonlinear state space models. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §1.
  • T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra (2015)

    Continuous control with deep reinforcement learning

    .
    arXiv:1509.02971. Cited by: §1.
  • M. Lutter, C. Ritter, and J. Peters (2019) Deep lagrangian networks: using physics as model prior for deep learning. In 7th International Conference on Learning Representations (ICLR), Cited by: §1, §1.
  • K. S. Narendra and K. Parthasarathy (1990) Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks 1 (1), pp. 4–27. Cited by: §1.
  • R. Ortega, A. J. Van Der Schaft, I. Mareels, and B. Maschke (2001) Putting energy back in control. IEEE Control Systems Magazine 21 (2), pp. 18–33. Cited by: §2.2, §2.2.
  • R. Ortega, A. Van Der Schaft, B. Maschke, and G. Escobar (2002) Interconnection and damping assignment passivity-based control of port-controlled hamiltonian systems. Automatica 38 (4), pp. 585–596. Cited by: §1.
  • D. Rowe, A. Ryman, and G. Rosensteel (1980) Many-body quantum mechanics as a symplectic dynamical system. Physical Review A 22 (6), pp. 2362. Cited by: §2.1.
  • A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, and P. Battaglia (2018) Graph networks as learnable physics engines for inference and control. In

    International Conference on Machine Learning (ICML)

    ,
    pp. 4467–4476. Cited by: §1.
  • D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al. (2017) Mastering the game of go without human knowledge. Nature 550 (7676), pp. 354. Cited by: §1.
  • T. Söderström and P. Stoica (1988) System identification. Prentice-Hall, Inc.. Cited by: §1.
  • M. Watter, J. Springenberg, J. Boedecker, and M. Riedmiller (2015) Embed to control: a locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing 29, pp. 2746–2754. Cited by: §1.
  • T. Wei, Y. Wang, and Q. Zhu (2017) Deep Reinforcement Learning for Building HVAC Control. In Proceedings of the 54th Annual Design Automation Conference (DAC), pp. 22:1–22:6. Cited by: §1.

Appendix A Experiment Implementation Details

The architectures used for our experiments are shown below. For all the tasks. SymODEN has the lowest number of total parameters. To ensure that the learnt function is smooth, we use Tanh activation function instead of ReLu. As we have differentiation in the computation graph, non-smooth activation functions would lead to discontinuities in the derivatives. This, in turn, would result in as ODE with a discontinuous RHS which is not desirable. All the architecture shown below are fully-connected neural networks. The first number indicates dimension of input layer. The last number indicates dimension of output layer. The dimension of hidden layers are shown in the middle with activation function.

Task 1: Pendulum

  • Input: 2 state dimensions, 1 action dimension

  • Baseline Model (0.36M parameters): 2 - 600Tanh - 600Tanh - 2Linear

  • Unstructured SymODEN (0.20M parameters):

    • : 2 - 400Tanh - 400Tanh - 1Linear

    • : 1 - 200Tanh - 200Tanh - 1Linear

  • SymODEN (0.13M parameters):

    • : 1 - 300Tanh - 300Tanh - 1Linear

    • : 1 - 50Tanh - 50Tanh - 1Linear

    • : 1 - 200Tanh - 200Tanh - 1Linear

Task 2: Pendulum with embedded data

  • Input: 3 state dimensions, 1 action dimension

  • Naive Baseline Model (0.65M parameters): 4 - 800Tanh - 800Tanh - 3Linear

  • Geometric Baseline Model (0.46M parameters):

    • , where : 1 - 300Tanh - 300Tanh - 300Tanh - 1Linear

    • approximate : 4 - 600Tanh - 600Tanh - 2Linear

  • Unstructured SymODEN (0.39M parameters):

    • , where : 1 - 300Tanh - 300Tanh - 300Tanh - 1Linear

    • : 3 - 500Tanh - 500Tanh - 1Linear

    • : 2 - 200Tanh - 200Tanh - 1Linear

  • SymODEN (0.14M parameters):

    • , where : 1 - 300Tanh - 300Tanh - 300Tanh - 1Linear

    • : 2 - 50Tanh - 50Tanh - 1Linear

    • : 2 - 200Tanh - 200Tanh - 1Linear

Task 3: CartPole

  • Input: 5 state dimensions, 1 action dimension

  • Naive Baseline Model (1.01M parameters): 6 - 1000Tanh - 1000Tanh - 5Linear

  • Geometric Baseline Model (0.82M parameters):

    • , where : 3 - 400Tanh - 400Tanh - 400Tanh - 3Linear

    • approximate : 6 - 700Tanh - 700Tanh - 4Linear

  • Unstructured SymODEN (0.67M parameters):

    • , where : 3 - 400Tanh - 400Tanh - 400Tanh - 3Linear

    • : 5 - 500Tanh - 500Tanh - 1Linear

    • : 3 - 300Tanh - 300Tanh - 2Linear

  • SymODEN (0.51M parameters):

    • , where : 3 - 400Tanh - 400Tanh - 400Tanh - 3Linear

    • : 3 - 300Tanh - 300Tanh - 1Linear

    • : 3 - 300Tanh - 300Tanh - 2Linear

Task 4:Acrobot

  • Input: 6 state dimensions, 1 action dimension

  • Naive Baseline Model (1.46M parameters): 7 - 1200Tanh - 1200Tanh - 6Linear

  • Geometric Baseline Model (0.97M parameters):

    • , where : 4 - 400Tanh - 400Tanh - 400Tanh - 3Linear

    • approximate : 7 - 800Tanh - 800Tanh - 4Linear

  • Unstructured SymODEN (0.78M parameters):

    • , where : 4 - 400Tanh - 400Tanh - 400Tanh - 3Linear

    • : 6 - 600Tanh - 600Tanh - 1Linear

    • : 4 - 300Tanh - 300Tanh - 2Linear

  • SymODEN (0.51M parameters):

    • , where : 4 - 400Tanh - 400Tanh - 400Tanh - 3Linear

    • : 4 - 300Tanh - 300Tanh - 1Linear

    • : 4 - 300Tanh - 300Tanh - 2Linear