Learning Contact Dynamics using Physically Structured Neural Networks

02/22/2021 ∙ by Andreas Hochlehnert, et al. ∙ 0

Learning physically structured representations of dynamical systems that include contact between different objects is an important problem for learning-based approaches in robotics. Black-box neural networks can learn to approximately represent discontinuous dynamics, but they typically require large quantities of data and often suffer from pathological behaviour when forecasting for longer time horizons. In this work, we use connections between deep neural networks and differential equations to design a family of deep network architectures for representing contact dynamics between objects. We show that these networks can learn discontinuous contact events in a data-efficient manner from noisy observations in settings that are traditionally difficult for black-box approaches and recent physics inspired neural networks. Our results indicate that an idealised form of touch feedback – which is heavily relied upon by biological systems – is a key component of making this learning problem tractable. Together with the inductive biases introduced through the network architectures, our techniques enable accurate learning of contact dynamics from observations.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep-learning-based models have accomplished remarkable achievements in a myriad of fields in recent years, ranging from image processing to text generation to reinforcement learning. These models are increasingly being applied to model physical systems, in areas ranging from fluid dynamics [kutz2017deep] to robotics [viereck2018learning, lutter2020deep, alvarez2020dynode]. Though neural networks are good at approximating general classes of functions, they often struggle to learn invariant properties of physical systems, such as the conservation of energy or momentum [greydanus2019hamilton] and other qualitative properties. A rapidly-growing line of work [raissiDeepHiddenPhysics2018, greydanus2019hamilton, lutter2020deep, cranmer2020lagrangian, saemundsson2020variational] has thus focused on how to introduce inductive biases into these networks to enable them to learn more accurate models from less data.

Code available at: https://github.com/libeanim/contact-symplectic-integrator-network.

(a) Find the contact time .

(b) Calculate true trajectory.
Figure 1: Example integration scheme for contact dynamics that enforces constraints exactly, in the context of a bouncing ball. Initially, the ball is time-stepped until a contact with the floor is detected through interpenetration at time

. Then, the trajectory is (a) linearly interpolated to find the

contact time where contact occurs between the ball and floor. Finally, the contact state at time is calculated, a transfer of momentum between and is performed, and the ball is time-stepped as usual to time .

One particular kind of physical phenomenon of great interest to areas such as robotics is contact dynamics, which describes how collisions between different objects affect the evolution of the system. For example, many of the basic actions that could be relevant for a robot, e.g. walking, jumping or grasping, involve discontinuous contact events. Accurately modelling these dynamics, and related downstream phenomena, such as friction, is a crucial step towards enabling the creation of robots that can learn to interact with unknown objects in the real world. Additionally, stronger inductive biases can improve the data efficiency of such models [deisenrothGaussianProcessesDataEfficient2015, greydanus2019hamilton, lutter2020deep, cranmer2020lagrangian, saemundsson2020variational]. Due to the presence of non-linear and non-smooth behaviour, accurate multi-body contact dynamics are widely considered notoriously difficult to compute [cirak2005decomposition], due to challenges with numerically enforcing non-interpenetration constraints and other issues. A robust and mature literature in physics and numerical analysis for handling these issues has developed in recent decades [jean1999non, cirak2005decomposition, leyendecker2008variational].

By bringing these ideas together with the rapidly-developing literature on neural ordinary differential equations

[E2017, Haber2017, chen2018neural, Ruthotto2018] and physically-inspired neural networks [raissiDeepHiddenPhysics2018, greydanus2019hamilton, lutter2020deep, saemundsson2020variational], we study the problem of learning unknown contact dynamics from data. Our work is based on the approach of saemundsson2020variational, which derives neural network architectures by discretising the variational principle underlying the physical equations of motion under study. Specifically, we propose specialised neural network architectures for modelling contact dynamics. These architectures combine discretisation schemes designed for contact dynamics with flexible parameterised networks for efficiently and accurately learning system behaviour. We study these networks under different scenarios, develop schemes to ensure accurate learning of dynamics, and demonstrate empirically that the addition of an idealised touch feedback sensor—rarely explicitly considered in deep learning, but widely utilised by biological systems—significantly improves model performance.

This suggests that the precise details of what hardware sensors the robot has available and how the learning problem is formulated to utilise those sensors, are both likely to have a significant impact on machine learning performance and should be studied further. This includes understanding the effect of different forms of touch feedback, such as observed tactile sensors, and inferred feedback using proprioceptive sensors

[rotella2018unsupervised, ortenzi2016kinematics]. Our work provides a starting point for addressing these questions within a physically structured deep learning framework.

2 Contact Dynamics

Figure 2: A CD-Lagrange network. Here, we begin from initial states . We calculate the next position , and proceed to calculate the next velocity as a sum of smooth and contact terms. These terms are in turn calculated using the conservative forces and the impulse , which are calculated from the parameterised Lagrangian, whose potential energy is given by a fully connected network.

Here we briefly review the mathematical formulation of contact dynamics. The state of a physical system is defined through position-velocity pairs . From analytical mechanics, the trajectories of a physical dynamical system are assumed to be a stationary point of the action functional

where is the Lagrangian—typically, the difference between kinetic and potential energy. By the d’Alembert–Lagrange principle, at instances where there is no contact, these trajectories follow the Euler-Lagrange equations

These equations possess a number of important physical properties, such as conservation of energy, momentum, and phase volume. A long line of work in numerical analysis has built efficient numerical integration schemes for these equations by maintaining these properties under discretisation [marsden2001discrete]. We are particularly interested in symplectic integrators, which conserve phase volume exactly and conserve energy and momentum to a high order of accuracy [sanz1992symplectic].

At time points, where there is contact, the variational principle is augmented with contact constraints that depend on the precise physical setting. These constraints ensure that states which are assumed impossible, such as those with interpenetration or deformation, cannot occur. At these time points, the d’Alembert–Lagrange principle does not apply, and the Euler–Lagrange equations (2) do not hold. To handle this, one splits the action integral in equation (2) into contact-free intervals with non-smooth contact-driven transitions in between, which are analysed separately.

Applying the variational principle at the contact time points yields transfer of momentum equations, usually resulting from Newton’s restitution law and the law of conservation of momentum [fetecau2003nonsmooth, halliday2013fundamentals]. These equations relate the state directly before contact to the state directly after contact. To calculate this, one isolates the components of momentum, which are normal to the contact surface, and transfers them appropriately.

Implementing contact dynamics numerically yields a number of issues, which depend on the regime employed. For example, certain regimes involve calculating the precise time when contacts occur, which might happen between integration time steps. In such cases, transfer of momentum is applied at the contact time point. This yields good numerical accuracy, but often requires the solution of an optimisation problem to determine the precise contact time [fetecau2003nonsmooth]. Other regimes might instead relax the dynamics to allow object interpenetration in order to avoid expensive fixed-point iterations [fekak2017new]. This entails carefully considering how to handle interpenetration to ensure accuracy. Figure 1 illustrates a sample numerical trajectory of a bouncing ball, including numerical transfer of momentum at a contact time point.

3 Physically Structured Networks for Contact Dynamics

To define neural network models for contact dynamics, we begin with the perspective of neural ODEs [E2017, Haber2017, chen2018neural]

, which will be helpful for this purpose. Here, we specify a system of ODEs driven by a single-layer neural network, and discretise it to obtain a deep or recurrent neural network architecture. In the classical case, an Euler discretisation yields a deep residual network, where the depth is given by the number of discretisation steps.

Building on these ideas, a number of recent works have proposed replacing the black-box system of ODEs with other systems that are more structured [raissiDeepHiddenPhysics2018, greydanus2019hamilton, lutter2020deep, cranmer2020lagrangian, saemundsson2020variational]. By applying structure-preserving discretisation schemes to these ODEs, one obtains neural network architectures with built-in inductive biases that improve generalisation and allow the networks to learn with less data.

In physical settings, a number of recent works have paired structured equations from mechanics, such as the Euler–Lagrange equations or Hamilton’s equations, with structure-preserving integrators, such as the Störmer–Verlet method, giving rise to physically structured neural networks [greydanus2019hamilton, lutter2020deep, cranmer2020lagrangian, saemundsson2020variational]. These networks use the mathematical structure of classical mechanics as an inductive bias, ensuring the network mirrors important qualitative physical properties, such as conservation of momentum or conservation of energy. These inductive biases have been shown to improve data efficiency and facilitate accurate long-term forecasting [raissiDeepHiddenPhysics2018, greydanus2019hamilton, lutter2020deep, cranmer2020lagrangian, saemundsson2020variational].

3.1 Central-Difference Lagrange Networks

We now employ these techniques to design neural network architectures specifically suited for modelling contact dynamics. There are three main properties present in the true physics that we aim to encode into the neural network architecture.

[(a)] The network should be expressive enough to model a general Newtonian rigid body system. The network should be constrained to exclude non-physical dynamics that black-box networks allow, so that it learns in a data-efficient manner. The network should be able to handle periods without contact separately from contact events, as said phenomena differ in character. Following saemundsson2020variational, one can construct a network satisfying the first two properties by applying a symplectic integrator to the Euler–Lagrange equations induced by a free-form Lagrangian parameterised by a fully connected network, which are interpreted as a structured neural ODE. We extend these constructions to explicitly handle contact events. We begin by defining a parameterised Lagrangian for a free-form Newtonian potential system, given by

where is a symmetric positive semi-definite position-independent inertia matrix, and is modelled by a fully connected neural network.

To obtain a network architecture for contact dynamics, we apply the symplectic Central-Difference Lagrange integrator of fekak2017new, modified for rigid body impacts by di2019benchmark. We work with a simplified variant designed for modelling rigid non-deformable bodies—see Appendix A, as well as fekak2017new,di2019benchmark for the general case. This construction yields a recurrent network architecture specialised for modelling contact dynamics, which we now showcase. A schematic overview can be found in Figure 2.

CD-Lagrange uses separate time grids for the position and velocity , at times and . We combine the position of all bodies in the matrix

where is the total number of bodies in the system. Given a position-velocity pair , CD-Lagrange calculates the next position using the midpoint velocity, given by

where is the size of the time step. At the next time step, the change in velocity is calculated as a sum of changes due to smooth dynamics and due to contact, yielding

where is the conservative force and the impulse that occurs during contact, both defined below. The conservative forces are calculated from the parameterised Lagrangian as

which, in our setting, are the conservative forces arising from the potential . Rigid-body impacts are handled by Newton’s restitution law [fekak2017new, di2019benchmark]

where with defined below, and is the elasticity parameter with defining elastic impacts with no dissipation. Furthermore, for rigid body-body impacts, the law of conservation of momentum

needs to be considered in order to uniquely resolve collision events. Define the projection operator

which contains the normal vectors of the surface each body it is in contact with. For each body

, the corresponding impulse is given by

where is a discrete contact signal for body and

represents the impulse acing on body . The operator is defined as

and ensures that the mass ratios between the bodies in contact, resulting from the law of conservation of momentum, are applied to the correct bodies. Here, the operator selects the components of the bodies that are in contact with each other, and is given by

In total, these expressions define the general CD-Lagrange network. Further details, including derivation of these expressions, are provided in Appendix A.

Idealised touch feedback.

When using a CD-Lagrange network to predict future system states, one needs to determine whether or not objects are in contact at each time step in order to calculate whether or not the impulse component of the network comes into play. To do so, we introduce an additional contact network that learns to predict a discrete contact signal defined by

at time step . We consider two regimes: one in which is unobserved, and another where it is fully observed at training time. The latter case can be conceptually thought of as the addition of an idealised touch feedback sensor to the system, which determines whether or not contact is present. We explore this difference and its effect on performance in the sequel.

Figure 3: Learning the equations of motion of an ideal pendulum, which has no contacts. Given a set of initial conditions, we forecast a path in phase space and predict against ground truth. We display ground truth, training data, and model predictions, and energy over time for each of the models. For the CD-Lagrange network, we display the learned contact function, which is approximately zero everywhere. Root mean squared errors for each network are given as follows: CD-Lagrange RMSE: 0.538, ResNet RMSE: 1.156, VIN VV RMSE: 0.509.

3.2 Learning from noisy observations

Given a dataset of observed trajectories , , of length each with step size and . We train the network by minimising the mean squared error loss between the (noisy) state observations and predicted states with respect to the parameters of the potential network . We also jointly train the contact network by minimising cross-entropy loss between its output and the contact signal . For a single trajectory, this is given by

The corresponding contact network loss is given by

where the activation of the output of the contact network is the sigmoid (logistic) function, and is the contact network’s prediction for body at time step . We further add a regularisation term on the parameters of the model, which is described in the sequel. In the full dataset, the observed trajectories might contain many steps or have unequal lengths . To avoid vanishing gradients, instead of optimising with respect to the full trajectories, we instead split the data into batches of horizon . The optimal parameters are found by minimising

using mini-batch stochastic gradient descent.


Since the change in velocity in CD-Lagrange is additive over conservative and contact components, as part of training, a CD-Lagrange network needs to learn to distinguish which component should be used to predict a given trajectory in the training data. regularization affects this aspect of the learning problem in an subtle but outsized manner: as a consequence of shrinking the potential network’s weights to zero, the regularizer shrinks the function to the zero function . This encourages the network to explain contact events with contact forces rather than conservative forces where possible.

4 Experiments

In order to investigate the properties of the proposed CD-Lagrange networks, we run experiments on a number of reference rigid body systems. In Section 4.1, we study performance on an ideal pendulum system without contacts, focusing on the physical properties of the conserved dynamics defined in equations (3.1) and (3.1), thereby checking the network’s performance relative to baselines. In Section 4.2, we perform experiments on a rigid bouncing ball system, exploring the behaviour of the network when contact happens and the effect of including an idealised touch sensor on the ability of recovering the underlying dynamics. Finally, in Section 4.3, we study body-body impacts in an idealised Newton’s cradle and look at the network’s ability to recover the underlying dynamics and effects of numerical interpenetration during contact events.

We compare CD-Lagrange networks with residual networks (ResNets) and variational integrator networks (VINs) by evaluating them in terms of predictive performance on held-out test data as well as qualitatively evaluating the corresponding phase diagrams. For contact experiments, we additionally include a modified residual network baseline (ResNetContact), which takes as input both the state as well the contact signal, for comparison against the CD-Lagrange networks’ idealised touch feedback sensor. To generate data, we simulate trajectories of motion and add independent Gaussian noise to the positions and velocities. Full details for experiment hyperparameters and training are given in Appendix


4.1 Learning to predict motion without contacts: an ideal pendulum

We first examine whether the performance of CD-Lagrange networks matches previous work on physically structured networks in cases where there is no contact. To this end, we consider learning to predict the trajectory of a simple ideal pendulum from observed data. The ideal pendulum is a point-mass attached to a mass-less rigid rod, suspended from a pivot. The pendulum swings back and forth due to gravity, without friction, so that the system conserves energy. This task has been studied previously by a number of authors, such as greydanus2019hamilton and saemundsson2020variational, who propose to use a network constructed from a variational velocity Verlet integrator in order to learn in a data-efficient manner. The variational integrator network preserves more physical properties than the CD-Lagrange network, and therefore we expect it to perform better in settings where there are no contacts. We generate training trajectories consisting of points each by simulating trajectories and adding independent Gaussian noise to all position-velocity pairs.

Figure 4: Schematics of the bouncing ball. Here, the ball falls due to the gravitational pull , and experiences a contact force in the direction of the contact normal vector with respect to the floor.

Figure 3 plots the phase-space trajectories for the ground truth system, the observed training data and the predicted evolution for each model. The CD-Lagrange network and variational integrator network achieve comparable error, which significantly improves upon the baseline residual network, which incorrectly dissipates energy. The network also recovers the correct contact function, which is the zero function. This shows that relative to prior works on physically structured neural networks, accurate performance in contact-free scenarios is not compromised by the extra structure added to handle contacts.

4.2 Learning body-wall contacts: an ideal bouncing ball

Figure 5: Learning the equations of motion of the bouncing ball. Given a set of initial conditions, we forecast a path in phase space and predict against ground truth. We display ground truth, training data, and model predictions for each of the models. For the CD-Lagrange network, we display the gradient of the potential energy, and the learned contact function. Root mean squared errors for each network’s predictions are as follows: CD-Lagrange: 0.067, CD-Lagrange (no touch): 7.578, ResNet: 6.778, ResNetContact: 8.866.

Next, we study using CD-Lagrange networks to learn contact dynamics for an ideal rigid bouncing ball. This system, shown in Figure 4, is a commonly studied example in the non-smooth contact dynamics literature [di2019benchmark]. The ball accelerates towards the ground due to the force of gravity, hits the ground, and bounces back up. The effect of the impact is determined by Newton’s restitution law, given in equation (3.1), where the elasticity parameter controls the energy behaviour. We focus on the regime , where energy is conserved.

Table 1:

Average root-mean squared error and standard error of the bouncing ball experiment averaged over

runs with trajectories each consisting of data points.

We consider two different data regimes. In the first regime, only observed trajectories of motion are provided at training time. In the second regime, we add idealised touch sensor data that indicates at each time point whether or not contact has occurred, given in equation (3.1). This can be viewed as representing an ideal impact sensor located inside the ball, or along the floor. To learn this system’s equations of motion from data, we generate training trajectories of points each, and train a CD-Lagrange network, a residual network, and a modified residual network that also takes contact data as input. Results can be seen in Figure 5. Here, we see that the CD-Lagrange network’s performance depends critically on whether or not it has access to touch feedback data. With touch feedback, the CD-Lagrange network predicts the ball’s trajectory much more accurately than the residual network. Without touch feedback, the CD-Lagrange network fails to learn the correct dynamics—instead, the network attempts to incorrectly explain noise using contact events, and contact events using smooth dynamics, because all scenarios lead to similar-looking noisy data from the network’s perspective. The residual network struggles to approximate the non-smooth behaviour at impact time, replacing instantaneous contacts with fast movement. Adding contact information to the residual network’s inputs does not improve its performance.

From examining the potential and contact network in Figure 5, we see that the CD-Lagrange network with touch feedback determines the impact times near-exactly. The gradient of the potential energy remains very close to the ground truth value, even as the system evolves and contact events occur. This shows that the network correctly determines that contact-driven changes in system states are caused by contacts, and not by spurious potential energy within the smooth dynamics—so long as the network is provided with touch feedback that enables it to differentiate between contact events and noise. Root mean squared error, together with standard error, can be seen in Table 1.

4.3 Learning body-body impacts: Newton’s cradle



Figure 6: Schematics of the Newton’s cradle system with two balls. Here, the first ball swings along the suspended rope due to gravity, and experiences a contact force in the direction of the contact normal vector upon impact.

Finally, we consider learning body-body impacts using CD-Lagrange networks in a simple Newton’s cradle system consisting of two balls suspended by a string, shown in Figure 6. We assume both bodies have no volume, are suspended from a common mounting point, and parameterise their locations using angles relative to the vertical axis. This means that collisions will occur perpendicular to the contact surface. We train on trajectories consisting of data points each, and again consider CD-Lagrange networks with and without idealised touch sensor data, along with residual network baseline within both data regimes.

Figure 7: Learning the equations of motion of the Newton’s cradle. Given a set of initial conditions, we forecast a path in phase space and predict against ground truth. We display ground truth, training data, and model predictions for each of the models. For the CD-Lagrange network, we display the gradient of the potential energy, and the learned contact function. Root mean squared errors for each network are given as follows: CDL RMSE: 0.406, Resnet RMSE: 1.685, ResnetContact RMSE: 1.034.
Table 2: Average root-mean squared error with standard error of the Newton’s cradle experiment averaged over runs with trajectories each consisting of data points.

Results can be seen in Figure 7. Here, we see that the phase-space trajectory of the CD-Lagrange network is substantially more accurate than for a baseline residual network. Root mean squared error, together with standard error, can be seen in Table 2. As before, adding contact information does not improve the residual network’s performance. This replicates the results of the single-body bouncing ball experiment in Section 4.2 in a multi-body setting.

Next, we examine how accurately the potential energy and contact function are recovered—this can also be seen in Figure 7. The CD-Lagrange network learns the potential accurately, with some error around the contact points. In particular, the contact network accurately learns to determine when contact will occur based on the touch sensor data. Mirroring the results of Section 4.2, the residual network struggles to predict contact events even when it is given explicit touch sensor data on when they occur, particularly through mistiming contact events to occur earlier or later than is correct.

The CD-Lagrange integrator, and by proxy the CD-Lagrange network, allow some slight interpenetration to occur. In body-wall impact scenarios, this does not strongly affect performance. However, in body-body impacts, this qualitative behaviour difference becomes more significant. Specifically, as momentum gets transferred between bodies, the slight allowed interpenetration can cause the system to significantly violate conservation of energy. In the Newton’s cradle example, this causes the resting ball to gain potential energy during collision events, eventually causing both balls to accelerate. To mitigate this problem, we introduce a closest-point projection during impact events [fetecau2003nonsmooth], which aligns the resting ball with the closest point on the boundary where no penetration occurs. This improves accuracy and restores correct long-term system behaviour.

5 Discussion

CD-Lagrange networks are a flexible way for data-driven learning of equations of motion that include contact dynamics. These networks can learn to accurately predict trajectories and resolve collisions in a noise-robust and data-efficient manner. This contributes to a rapidly-expanding line of work on neural ODEs and physically structured learning, and shows that these ideas can work successfully in non-smooth settings.

Compared to black-box neural networks, the implementation of physically structured networks for learning contact dynamics presents a number of additional challenges to ensure their performance. In particular, we find that the presence or absence of touch feedback impacts the model’s performance dramatically. This is because without touch feedback the learning problem is ambiguous: abrupt shifts in the system state can be explained either by contacts, or by noise, and it is difficult for the network to tell which is which. This results in weak identifiability in the loss function, which could lead to difficulties with local optima or other issues. It is therefore important that the network is provided with an appropriate means to differentiate between the two, e.g. by means of a touch sensor.

The networks studied here are based on the CD-Lagrange integrator, which is an explicit scheme that fits conveniently within an automatic differentiation framework. This convenience comes at a cost: the integrator allows for some physically incorrect interpenetration, which can affect long-term prediction accuracy. Other schemes, such as variational integrators for contact dynamics [fetecau2003nonsmooth], avoid these issues and achieve higher accuracy, but they require the solution of fixed-point iterations or convex optimisation problems during time-stepping, which renders them more expensive and cumbersome to work with. These issues can potentially be mitigated through the use of differential physics engines as components of deep-learning-based models [de2018end]. Studying these trade-offs in the context of learning could pave the way toward better understanding on how to incorporate inductive biases to improve performance of neural networks to model the real world, thereby facilitating their use in applications such as robotics.

6 Conclusion

In this work, we introduce CD-Lagrange networks, which build on ideas from neural ODEs and physically structured learning to construct networks for learning contact dynamics in data-limited regimes. With the addition of an idealised touch feedback sensor, these networks can learn to accurately reconstruct non-smooth contact dynamics from data, in a way that disentangles contact-driven forces from conservative forces. The simple and explicit structure makes these networks interpretable, well-matched with the underlying physics, and straightforward to implement. A rapidly growing line of work has focused on adapting deep networks to various physical settings. We hope our contributions facilitate the inclusion of non-smooth contact dynamics within these settings.

Appendix A Appendix: extended CD-Lagrange networks

The CD-Lagrange scheme, in full generality, is designed for deformable body impacts for finite element calculations [fekak2017new]. In the non-deformable case, fekak2017new propose employing Newton’s restitution law for rigid body impacts. This variant has been benchmarked by di2019benchmark. We extend the framework to support three-dimensional rigid body-body collisions by ensuring that the overall momentum is conserved during the impact. Instead of multi-node bodies—as used, for example, in finite element simulations of deformation during multi-body collision—we restrict ourselves to single-node bodies in order to simplify the problem.

Figure 8: Visualising a deformable body before (left) and during (right) impact into a rigid wall . Here, denotes the surface normal of the wall and the set of contact points.

CD-Lagrange is based on an asynchronous version of the central difference method and uses Lagrange multipliers to enforce the contact constraints. For a given problem, must fulfill the Karush–Kuhn–Tucker (KKT) conditions, sometimes also referred to as Hertz–Signorini–Moreau (HSM) conditions [wriggers1999new]. These are given by

which hold for all , where is the outer pointing normal on the wall and the outer-pointing normal on the deformable body . In particular, when a contact occurs. represents all contact points on , the closest point to and is the Cauchy stress field.

Condition (A) enforces the impenetrability, (A) implies that no adhesion occurs upon contact which means the contact-stress is only compressive, and (A) reduces the contact-states to contact ( and ) and no contact (, ). The subscript is to emphasise dependence on the normal part of the velocity, i.e. the part perpendicular to the contact surface. This subscript will be omitted in the rest of this derivation to simplify notation.

We represent the position of all (single-node) bodies in a -dimensional space by

where represents the individual position of body .

To be able to calculate the velocity component each body has towards the impact surface a projection operator

is introduced, where is the surface normal of the object body is colliding with. The distance between each body and the closest point on the wall or another body is represented by

Finally, define to be the contact function. Note that the equations of motion depend only on through : we will subsequently replace this function with the contact network .

Discrete impact equations.

Given a single impact event at time of a hyperelastic solid, we may write the action integral [cirak2005decomposition]

where is the semi-discrete Lagrangian of the form

where represents the mass matrix and the external potential, e.g. the gravitational field [cirak2005decomposition]. If there are multiple impacts over the interval the action integral can be split up further to handle the contact events in a sequential manner. We apply small variation of the path

Applying the Fundamental Lemma of the Calculus of Variations [cirak2005decomposition] yields the equations

which fully describe the dynamics. Here (A) represents the smooth dynamics, (A) represents the non-smooth dynamics and (A) represents the kinetic energy balance upon impact.

Using the theory of non-smooth dynamics [moreau1988unilateral], (A), (A), and (A) can be combined into one equation and the velocity can be expressed as a sum of the smooth and non-smooth parts, given by

fekak2017new show that incorporating the KKT conditions into the above yields the non-smooth contact dynamics (NSCD) equations


By integrating (2) between and

and estimating the time integrals using a midpoint-rule, following fekak2017new, one obtains the semi-discrete equilibrium equation

which results in an asynchronous update rule for the velocity given by

Further, can be updated using the previous velocity

To calculate the impulse we combine Newton’s restitution law, as proposed by [fekak2017new], and the law of conservation of linear momentum. Consequently, the impulse between body and caused by a collision is defined by

Unpacking this equation, the velocity difference between the two bodies is projected onto the normal of the contact surface and scaled by the mass ratio and the elasticity .

We introduce the matrix which, at impact times, determines which bodies are in contact. This is given by

From this, the mass ratio can be expressed as

where the inverse is applied element-wise. This definition of works for multiple simultaneous two-body collisions. In case three (or more) bodies collide with each other simultaneously, contacts are resolved sequentially.

Solving equation (A) for and using equation (A) to allow two-body impacts yields

where . In equation (A), the operator incorporates the mass ratio, and is responsible for selecting the two bodies that are interacting in the collision when applied to . projects the velocity for every body onto the surface normal of the contact surface. Since we work with independent single-node bodies, we are only interested in the diagonal elements of . The maximum makes sure the velocity of the colliding object is not pointing away from the surface, indicating that the collision already happened in the previous time step. We replace with the contact network . Since the contact network receives as well as , it can learn whether the collision already happened or not, and thus we omit the .

Summarising, the integrator is governed by the equations

In our experiments we assume is known. represents the contact-network, which returns if a contact occurs and if not. The potential is approximated using the potential-network.

We conclude with two remarks on more general settings. Firsly, if we wish to allow multiple nodes per body the operator needs to be adapted accordingly, and the internal stress field has to be added to the force (A), to yield

Further, it is no longer possible to use only the diagonal elements of the impulse equation (A), as now the interactions between nodes play a role in contact resolution.

Secondly, the force can easily be extended to contain other forces. For example, we can add damping to obtain

where is a learned damping coefficient.

Appendix B Appendix: experimental details

Learning rate.

We use a learning rate of in combination with the ADAM optimiser in all experiments.


We use the set-up proposed by saemundsson2020variational, which consists of a two-layer fully connected neural network with activation function between the layers. The input layer has a size of 500 units and outputs a single real scalar. regularisation is used in both layers.

Baseline contact-aware residual network.

To ensure a fair comparison, we use a vanilla residual network (ResNet) as a baseline. It consists of two dense layers with activation function and an input layer size of units. We update the state according to

The residual network also receives the contact information and uses the same architecture as to predict contact by means of


The parameters used in the pendulum experiment are

We add centered Gaussian noise with standard deviation

to the positions and velocities. All models are trained for epochs.

Bouncing ball.

The projection operator is , and the other parameters are

All models are trained for epochs.

Newton’s cradle.

We will use two generalised angles to determine the position of both balls. The initial positions and velocities are

where represent body 1, and represent body 2. The restitution parameter is , and

For simplification, we assume both bodies have no volume and are connected with the same joint. This means collisions will always happen perpendicular to the contact surface. The projection operator is thus

Additional experiments.

Here we include additional experimental results for the Newton’s cradle without touch feedback in Figure 9, and the bouncing ball with elasticity in Figure 10.

Figure 9: Learning the equations of motion of the Newton’s cradle without touch feedback and Gaussian noise in the training data with . The model tries to approximate the collisions using the potential network and therefore struggles to predict the trajectory accurately. CD-Lagrange RMSE: 0.907.
Figure 10: Learning the equations of motion of the bouncing ball with elasticity . CD-Lagrange approximates the true trajectory in the most accurate manner, particularly for longer simulation times which correspond to phase space regions closer to the centre of the spiral. CDL RMSE:2.076, ResNet RMSE: 8.291, ResNetContact RMSE: 4.156.