Realistic simulators of complex physics are invaluable to many scientific and engineering disciplines, however traditional simulators can be very expensive to create and use. Building a simulator can entail years of engineering effort, and often must trade off generality for accuracy in a narrow range of settings. High-quality simulators require substantial computational resources, which makes scaling up prohibitive. Even the best are often inaccurate due to insufficient knowledge of, or difficulty in approximating, the underlying physics and parameters. An attractive alternative to traditional simulators is to use machine learning to train simulators directly from observed data, however the large state spaces and complex dynamics have been difficult for standard end-to-end learning approaches to overcome.
Here we present a general framework for learning simulation from data—“Graph Network-based Simulators” (GNS). Our framework imposes strong inductive biases, where rich physical states are represented by graphs of interacting particles, and complex dynamics are approximated by learned message-passing among nodes.
We implemented our GNS framework in a single deep learning architecture, and found it could learn to accurately simulate a wide range of physical systems in which fluids, rigid solids, and deformable materials interact with one another. Our model also generalized well to much larger systems and longer time scales than those on which it was trained. While previous learning simulation approachesLi et al. (2018); Ummenhofer et al. (2020) have been highly specialized for particular tasks, we found our single GNS model performed well across dozens of experiments and was generally robust to hyperparameter choices. Our analyses showed that performance was determined by a handful of key factors: its ability to compute long-range interactions, inductive biases for spatial invariance, and training procedures which mitigate the accumulation of error over long simulated trajectories.
2 Related Work
Our approach focuses on particle-based simulation, which is used widely across science and engineering, e.g., computational fluid dynamics, computer graphics. States are represented as a set of particles, which encode mass, material, movement, etc. within local regions of space. Dynamics are computed on the basis of particles’ interactions within their local neighborhoods. One popular particle-based method for simulating fluids is “smoothed particle hydrodynamics” (SPH) Monaghan (1992), which evaluates pressure and viscosity forces around each particle, and updates particles’ velocities and positions accordingly. Other techniques, such as “position-based dynamics” (PBD) Müller et al. (2007) and “material point method” (MPM) Sulsky et al. (1995), are more suitable for interacting, deformable materials. In PBD, incompressibility and collision dynamics involve resolving pairwise distance constraints between particles, and directly predicting their position changes. Several differentiable particle-based simulators have recently appeared, e.g., DiffTaichi Hu et al. (2019), PhiFlow Holl et al. (2020), and Jax-MD Schoenholz and Cubuk (2019)
, which can backpropagate gradients through the architecture.
Learning simulations from data Grzeszczuk et al. (1998) has been an important area of study with applications in physics and graphics. Compared to engineered simulators, a learned simulator can be far more efficient for predicting complex phenomenon He et al. (2019); e.g., Ladickỳ et al. (2015); Wiewel et al. (2019) learn parts of a fluid simulator for faster prediction.
Graph networks (GN) Battaglia et al. (2018)—a type of graph neural network Scarselli et al. (2008)—have recently proven effective at learning forward dynamics in various settings that involve interactions between many entities. A GN maps an input graph to an output graph with the same structure but potentially different node, edge, and graph-level attributes, and can be trained to learn a form of message-passing Gilmer et al. (2017), where latent information is propagated between nodes via the edges. GNs and their variants, e.g., “interaction networks”, can learn to simulate rigid body, mass-spring, n-body, and robotic control systems Battaglia et al. (2016); Chang et al. (2016); Sanchez-Gonzalez et al. (2018); Mrowca et al. (2018); Li et al. (2019); Sanchez-Gonzalez et al. (2019), as well as non-physical systems, such as multi-agent dynamics Tacchetti et al. (2018); Sun et al. (2019), algorithm execution Veličković et al. (2020), and other dynamic graph settings Trivedi et al. (2019, 2017); Yan et al. (2018); Manessi et al. (2020).
Our GNS framework builds on and generalizes several lines of work, especially Sanchez-Gonzalez et al. (2018)’s GN-based model which was applied to various robotic control systems, Li et al. (2018)’s DPI which was applied to fluid dynamics, and Ummenhofer et al. (2020)’s Continuous Convolution (CConv) which was presented as a non-graph-based method for simulating fluids. Crucially, our GNS framework is a general approach to learning simulation, is simpler to implement, and is more accurate across fluid, rigid, and deformable material systems.
3.1 General learnable simulation
We assume is the state of the world at time . Applying physical dynamics over timesteps yields a trajectory of states, . A simulator, , models the dynamics by mapping preceding states to causally consequent future states. We denote a simulated “rollout” trajectory as, , which is computed iteratively by, for each timestep. Simulators compute dynamics information that reflects how the current state is changing, and use it to update the current state to a predicted future state (see Figure 2(a)). An example is a numerical differential equation solver: the equations compute dynamics information, i.e., time derivatives, and the integrator is the update mechanism.
A learnable simulator, , computes the dynamics information with a parameterized function approximator, , whose parameters, , can be optimized for some training objective. The represents the dynamics information, whose semantics are determined by the update mechanism. The update mechanism can be seen as a function which takes the , and uses to predict the next state, . Here we assume a simple update mechanism—an Euler integrator—and that represents accelerations. However, more sophisticated update procedures which call more than once can also be used, such as higher-order integrators (e.g., Sanchez-Gonzalez et al. (2019)).
3.2 Simulation as message-passing on a graph
Our learnable simulation approach adopts a particle-based representation of the physical system (see Section 2), i.e., , where each of the particles’ represents its state. Physical dynamics are approximated by interactions among the particles, e.g., exchanging energy and momentum among their neighbors. The way particle-particle interactions are modeled determines the quality and generality of a simulation method—i.e., the types of effects and materials it can simulate, in which scenarios the method performs well or poorly, etc. We are interested in learning these interactions, which should, in principle, allow learning the dynamics of any system that can be expressed as particle dynamics. So it is crucial that different values allow to span a wide range of particle-particle interaction functions.
Particle-based simulation can be viewed as message-passing on a graph. The nodes correspond to particles, and the edges correspond to pairwise relations among particles, over which interactions are computed. We can understand methods like SPH in this framework—the messages passed between nodes could correspond to, e.g., evaluating pressure using the density kernel.
We capitalize on the correspondence between particle-based simulators and message-passing on graphs to define a general-purpose based on GNs. Our has three steps—Encoder, Processor, Decoder Battaglia et al. (2018) (see Figure 2(b)).
Encoder definition. The embeds the particle-based state representation, , as a latent graph, , where , , and (see Figure 2(b,c)). The node embeddings, , are learned functions of the particles’ states. Directed edges are added to create paths between particle nodes which have some potential interaction. The edge embeddings, , are learned functions of the pairwise properties of the corresponding particles, , e.g., displacement between their positions, spring constant, etc. The graph-level embedding, , can represent global properties such as gravity and magnetic fields (though in our implementation we simply appended those as input node features—see Section 4.2 below).
Processor definition. The computes interactions among nodes via steps of learned message-passing, to generate a sequence of updated latent graphs, , where (see Figure 2(b,d)). It returns the final graph, . Message-passing allows information to propagate and constraints to be respected: the number of message-passing steps required will likely scale with the complexity of the interactions.
Decoder definition. The extracts dynamics information from the nodes of the final latent graph, (see Figure 2(b,e)). Learning should cause the representations to reflect relevant dynamics information, such as acceleration, in order to be semantically meaningful to the update procedure.
4 Experimental methods
Model code and datasets will be released on publication.
4.1 Physical domains
We explored how our GNS learns to simulate in datasets which contained three diverse, complex physical materials: water as a barely damped fluid, chaotic in nature; sand as a granular material with complex frictional behavior; and “goop” as a viscous, plastically deformable material. These materials have very different behavior, and in most simulators, require implementing separate material models or even entirely different simulation algorithms.
We also created Water-3D, a high-resolution 3D water scenario with randomized water position, initial velocity and volume, comparable to Ummenhofer et al. (2020)’s containers of water. We used SPlisHSPlasH Bender and Koschier (2015), a SPH-based fluid simulator with strict volume preservation to generate this dataset.
For most of our domains, we use the Taichi-MPM engine Hu et al. (2018) to simulate a variety of challenging 2D and 3D scenarios. We chose MPM for the simulator because it can simulate a very wide range of materials, and also has some different properties than PBD and SPH, e.g., particles may become compressed over time.
Our datasets typically contained 1000 train, 100 validation and 100 test trajectories, each simulated for 300-2000 timesteps (tailored to the average duration for the various materials to come to a stable equilibrium). A detailed listing of all our datasets can be found in the Supplementary Materials B.
4.2 GNS implementation details
We implemented the above model using standard deep learning building blocks and nearest neighbor algorithms.
Input and output representations
. Each particle’s input state vector represents position, a sequence ofprevious velocities111 is a hyperparameter which we explore in our experiments., and features that capture static material properties (e.g., water, sand, goop, rigid, boundary particle), , respectively. The global properties of the system,
, include external forces and global material properties, when applicable. The prediction targets for supervised learning are the per-particle average acceleration,. Note that in our datasets, we only require vectors: the and are computed from using finite differences. For full details of these input and target features, see Supplementary Material Section B.
Encoder details. The Encoder constructs the graph structure by assigning a node to each particle and adding edges between particles within a “connectivity radius”, , which reflected local interactions of particles, and which was kept constant for all simulations of the same resolution. For generating rollouts, on each timestep the graph’s edges were recomputed by a nearest neighbor algorithm, to reflect the current particle positions.
The Encoder implements and
as multilayer perceptrons (MLP), which encode node features and edge features into the latent vectors,and , of size 128.
We tested two Encoder variants, distinguished by whether they use absolute versus relative positional information. For the absolute variant, the input to was the described above, with the globals features concatenated to it. The input to , i.e., , did not actually carry any information and was discarded, with the in
set to a trainable fixed bias vector. The relativeEncoder variant was designed to impose an inductive bias of invariance to absolute spatial location. The was forced to ignore information within by masking it out. The was provided with the relative positional displacement, and its magnitude222Similarly, relative velocities could be used to enforce invariance to inertial frames of reference., . Both variants concatenated the global properties onto each before passing it to .
Processor details. Our processor uses a stack of GNs (where is a hyperparameter) with identical structure, MLPs as internal edge and node update functions, and either shared or unshared parameters (as analyzed in Results Section 5.4). We use GNs without global features or global updates (similar to an interaction network)333In preliminary experiments we also attempted using a Processor with a full GN and a global latent state, for which the global features are encoded with a separate MLP.
, and with a residual connections between the input and output latent node and edge attributes.
Decoder details. Our decoder’s learned function, , is an MLP. After the Decoder, the future position and velocity are updated using an Euler integrator, so the corresponds to accelerations, , with 2D or 3D dimension, depending on the physical domain. As mentioned above, the supervised training targets were simply these, vectors.
Neural network parameterizations
. All MLPs have two hidden layers (with ReLU activations), followed by a non-activated output layer, each layer with size of 128. All MLPs (except the output decoder) are followed by a LayerNormBa et al. (2016) layer, which we generally found improved training stability.
Training noise. Modeling a complex and chaotic simulation system requires the model to mitigate error accumulation over long rollouts. Because we train our models on ground-truth one-step data, they are never presented with input data corrupted by this sort of accumulated noise. This means that when we generate a rollout by feeding the model with its own noisy, previous predictions as input, the fact that its inputs are outside the training distribution may lead it to make more substantial errors, and thus rapidly accumulate further error. We use a simple approach to make the model more robust to noisy inputs: during training we corrupt the input positions and velocities of the model with random-walk noise , so the training distribution is more similar to the distribution generated during rollouts. See Supplementary Materials B for full details.
. We normalize all input and target vectors elementwise to zero mean and unit variance, using statistics computed online during training. Preliminary experiments showed that normalization led to faster training, though converged performance was not noticeably improved.
Loss function and optimization procedures. We randomly sampled particle state pairs from our training trajectories, calculated target accelerations , and computed the loss on the predicted per-particle accelerations, i.e., . We optimized the model parameters over this loss with the Adam optimizer Kingma and Ba (2014), using a minibatch size of 2. We performed a maximum of 20M gradient update steps, with exponential learning rate decay from to . While models can train in significantly less steps, we avoid aggressive learning rates to reduce variance across datasets and make comparisons across settings more fair.
We evaluated our models regularly during training by producing full-length rollouts on 5 held-out validation trajectories, and recorded the associated model parameters for best rollout MSE. We stopped training when we observed negligible decrease in MSE, which, on GPU/TPU hardware, was typically within a few hours for smaller, simpler datasets, and up to a week for the larger, more complex datasets.
To report quantitative results, we evaluated our models after training converged by computing one-step and rollout metrics on held-out test trajectories, drawn from the same distribution of initial conditions used for training. We used particle-wise MSE as our main metric between ground truth and predicted data, both for rollout and one-step predictions, averaging across time, particle and spatial axes. We also investigated distributional metrics including optimal transport (OT) Villani (2003) (approximated by the Sinkhorn Algorithm Cuturi (2013)), and Maximum Mean Discrepancy (MMD) Gretton et al. (2012). For the generalization experiments we also evaluate our models on a number of initial conditions drawn from distributions different than those seen during training, including, different number of particles, different object shapes, different number of objects, different initial positions and velocities and longer trajectories. See Supplementary Materials B for full details on metrics and evaluation.
|Experimental domain||1-step ()||Rollout ()|
Our main findings are that our GNS model can learn accurate, high-resolution, long-term simulations of different fluids, deformables, and rigid solids, and it can generalize well beyond training to much longer, larger, and challenging settings. In Section 5.5 below, we compare our GNS model to two recent, related approaches, and find our approach was simpler, more generally applicable, and more accurate.
To challenge the robustness of our architecture, we used a single set of model hyperparameters for training across all of our experiments. Our GNS architecture used the relative Encoder variant, 10 steps of message-passing, with unshared GN parameters in the Processor. We applied noise with a scale of to the input states during training.
5.1 Simulating complex materials
Our GNS model was very effective at learning to simulate different complex materials. Table 1 shows the one-step and rollout accuracy, as MSE, for all experiments. For intuition about what these numbers mean, the edge length of the container was approximately , and Figure 3(a-c) shows rendered images of the rollouts of our model, compared to ground truth444All rollout videos can be found here: https://sites.google.com/view/learning-to-simulate
. Visually, the model’s rollouts are quite plausible. Though specific model-generated trajectories can be distinguished from ground truth when compared side-by-side, it is difficult to visually classify individual videos as generated from our model versus the ground truth simulator.
Our GNS model scales to large amounts of particles and very long rollouts. With up to 19k particles in our 3D domains—substantially greater than demonstrated in previous methods—GNS can operate at resolutions high enough for practical prediction tasks and high-quality 3D renderings (e.g., Figure 1). And although our models were trained to make one-step predictions, the long-term trajectories remain plausible even over thousands of rollout timesteps.
The GNS model could also learn how the materials respond to unpredictable external forces. In the FluidShake domain, a container filled with water is being moved side-to-side, causing splashes and irregular waves.
Our model could also simulate fluid interacting with complicated static obstacles, as demonstrated by our WaterRamps and SandRamps domains in which water or sand pour over 1-5 obstacles. Figure 3(d) depicts comparisons between our model and ground truth, and Table 1 shows quantitative performance measures.
We also trained our model on continuously varying material parameters. In the Continuous domain, we varied the friction angle of a granular material, to yield behavior similar to a liquid (0), sand (45), or gravel ( 60). Our results and videos
show that our model can account for these continuous variations, and even interpolate between them: a model trained with the regionheld out in training can accurately predict within that range.
5.2 Multiple interacting materials
So far we have reported results of training identical GNS architectures separately on different systems and materials. However, we found we could go a step further and train a single architecture with a single set of parameters to simulate all of our different materials, interacting with each other in a single system.
In our MultiMaterial domain, the different materials could interact with each other in complex ways, which means the model had to effectively learn the product space of different interactions (e.g., water-water, sand-sand, water-sand, etc.). The behavior of these systems was often much richer than the single-material domains: the stiffer materials, such as sand and goop, could form temporary semi-rigid obstacles, which the water would then flow around. Figure 3(e) and this video shows renderings of such rollouts. Visually, our model’s performance in MultiMaterial is comparable to its performance when trained on those materials individually.
test dataset. Error bars display lower and higher quartiles, and are shown for the default parameters. (right) Comparison of average performance of our GNS model to CConv. (k,l) Qualitative comparison between GNS (k) and CConv (l) inBoxBath after 50 rollout steps (video link
). (m) Quantitative comparison of our GNS model (red) to the CConv model (grey) across the test set . For our model, we trained one or more seeds using the same set of hyper-parameters and show results for all seeds. For the CConv model we ran several variations including different radius sizes, noise levels, and number of unroll steps during training, and show the result for the best seed. Errors bars show the standard error of the mean across all of the trajectories in the test set (95% confidence level).
We found that the GNS generalizes well even beyond its training distributions, which suggests it learns a more general-purpose understanding of the materials and physical processes experienced during training.
To examine its capacity for generalization, we trained a GNS architecture on WaterRamps, whose initial conditions involved a square region of water in a container, with 1-5 ramps of random orientation and location. After training, we tested the model on several very different settings. In one generalization condition, rather than all water being present in the initial timestep, we created an “inflow” that continuously added water particles to the scene during the rollout, as shown in Figure 3(f). When unrolled for 2500 time steps, the scene contained 28k particles—an order of magnitude more than the 2.5k particles used in training—and the model was able to predict complex, highly chaotic dynamics not experienced during training, as can be seen in this video. The predicted dynamics were visually similar to the ground truth sequence.
Because we used relative displacements between particles as input to our model, in principle the model should handle much scenes with much larger spatial extent at test time. We evaluated this on a much larger domain, with several inflows over a complicated arrangement of slides and ramps (see Figure 3(h), video here). The test domain’s spatial width height were , which was 32x larger than the training domain’s area; at the end of the rollout, the number of particles was 85k, which was 34x more than during training; we unrolled the model for 5000 steps, which was 8x longer than the training trajectories. We conducted a similar experiment with sand on the SandRamps domain, testing model generalization to hourglass-shaped ramps.
As a final, extreme test of generalization, we applied a model trained on MultiMaterial to a custom test domain with inflows of various materials and shapes (Figure 3(g)). The model learned about frictional behavior between different materials (sand on sticky goop, versus slippery floor), and that the model generalized well to unseen shapes, such as hippo-shaped chunks of goop and water, falling from mid-air, as can be observed in this video.
5.4 Key architectural choices
We performed a comprehensive analysis of our GNS’s architectural choices to discover what influenced performance most heavily. We analyzed a number of hyperparameter choices—e.g., number of MLP layers, linear encoder and decoder functions, global latent state in the Processor—but found these had minimal impact on performance (see Supplementary Materials C for details).
While our GNS model was generally robust to architectural and hyperparameter settings, we also identified several factors which had more substantial impact:
the number of message-passing steps,
shared vs. unshared Processor GN parameters,
the connectivity radius,
the scale of noise added to the inputs during training,
relative vs. absolute Encoder.
We varied these choices systematically for each axis, fixing all other axes with the default architecture’s choices, and report their impact on model performance in the Goop domain (Figure 4).
For (1), Figure 4(a,b) shows that a greater numbers of message-passing steps yielded improved performance in both one-step and rollout accuracy. This is likely because increasing allows computing longer-range, and more complex, interactions among particles. Because computation time scales linearly with , in practice it is advisable to use the smallest that still provides desired performance.
For (2), Figure 4(c,d) shows that models with unshared GN parameters in the Processor yield better accuracy, especially for rollouts. Shared parameters imposes a strong inductive bias that makes the Processor analogous to a recurrent model, while unshared parameters are more analogous to a deep architecture, which incurs times more parameters. In practice, we found marginal difference in computational costs or overfitting, so we conclude that using unshared parameters has little downside.
For (3), Figure 4(e,f) shows that greater connectivity values yield lower error. Similar to increasing , larger neighborhoods allow longer-range communication among nodes. Since the number of edges increases with , more computation and memory is required, so in practice the minimal that gives desired performance should be used.
For (4), we observed that rollout accuracy is best for an intermediate noise scale (see Figure 4(g,h)), consistent with our motivation for using it (see Section 4.3). We also note that one-step accuracy decreases with increasing noise scale. This is not surprising: adding noise makes the training distribution less similar to the uncorrupted distribution used for one-step evaluation.
For (5), Figure 4(i,j) shows that the relative Encoder is clearly better than the absolute version. This is likely because the underlying physical processes that are being learned are invariant to spatial position, and the relative Encoder’s inductive bias is consistent with this invariance.
5.5 Comparisons to previous models
We compared our approach to two recent papers which explored learned fluid simulators using particle-based approaches. Li et al. (2018)’s DPI studied four datasets of fluid, deformable, and solid simulations, and presented four different, distinct architectures, which were similar to Sanchez-Gonzalez et al. (2018)’s, with additional features such as as hierarchical latent nodes. When training our GNS model on DPI’s BoxBath domain, we found it could learn to simulate the rigid solid box floating in water, faithfully maintaining the stiff relative displacements among rigid particles, as shown Figure 4(k) and this video. Our GNS model did not require any modification—the box particles’ material type was simply a feature in the input vector—while DPI required a specialized hierarchical mechanism and forced all box particles to preserve their relative displacements with each other. Presumably the relative Encoder and training noise alleviated the need for such mechanisms.
Ummenhofer et al. (2020)’s CConv propagates information across particles555The authors state CConv does not use an explicit graph representation, however we believe their particle update scheme can be interpreted as a special type of message-passing on a graph. See Supplementary Materials D., and uses particle update functions and training procedures which are carefully tailored to modeling fluid dynamics (e.g., an SPH-like local kernel, different sub-networks for fluid and boundary particles, a loss function that weights slow particles with few neighbors more heavily). Ummenhofer et al. (2020) reported CConv outperformed DPI, so we quantitatively compared our GNS model to CConv. We implemented CConv as described in its paper, plus two additional versions which borrowed our noise and multiple input states, and performed hyperparameter sweeps over various CConv parameters. Figure 4(m) shows that across all six domains we tested, our GNS model with default hyperparameters has better rollout accuracy than the best CConv model (among the different versions and hyperparameters) for that domain. In this comparison video, we observe than CConv performs well for domains like water, which it was built for, but struggles with some of our more complex materials. Similarly, in a CConv rollout of the BoxBath domain the rigid box loses its shape (Figure 4(l)), while our method preserves it. See Supplementary Materials D for full details of our DPI and CConv comparisons.
We presented a powerful, general-purpose machine learning framework for learning to simulate complex systems, based on particle-based representations of physics and learned message-passing on graphs. Our experimental results show our single GNS architecture can learn to simulate the dynamics of fluids, rigid solids, and deformable materials, interacting with one another, using tens of thousands of particles over thousands time steps. We find our model is simpler, more accurate, and has better generalization than previous approaches.
While here we focus on mesh-free particle methods, our GNS approach may also be applicable to data represented using meshes, such as finite-element methods. There are also natural ways to incorporate stronger, generic physical knowledge into our framework, such as Hamiltonian mechanics Sanchez-Gonzalez et al. (2019) and rich, architecturally imposed symmetries. To realize advantages over traditional simulators, future work should explore how to parameterize and implement GNS computations more efficiently, and exploit the ever-improving parallel compute hardware. Learned, differentiable simulators will be valuable for solving inverse problems, by not strictly optimizing for forward prediction, but for inverse objectives as well.
More broadly, this work is a key advance toward more sophisticated generative models, and furnishes the modern AI toolkit with a greater capacity for physical reasoning.
We thank Victor Bapst and Jessica Hamrick for valuable feedback on the work and manuscript, and we thank Benjamin Ummenhofer for advice on implementing the continuous convolution baseline model.
- Layer normalization. arXiv preprint arXiv:1607.06450. Cited by: §4.2.
- Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems, pp. 4502–4510. Cited by: §2.
- Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261. Cited by: §D.1, §2, §3.2.
- Divergence-free smoothed particle hydrodynamics. In Proceedings of the 2015 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, External Links: Cited by: §4.1.
- A compositional object-based approach to learning physical dynamics. arXiv preprint arXiv:1612.00341. Cited by: §2.
- Sinkhorn distances: lightspeed computation of optimal transportation distances. External Links: Cited by: §B.4, §4.4.
- Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1263–1272. Cited by: §2.
-  (2018) Graph nets library. DeepMind. External Links: Cited by: §4.3.
- A kernel two-sample test. Journal of Machine Learning Research 13 (Mar), pp. 723–773. Cited by: §B.4, §4.4.
- Neuroanimator: fast neural network emulation and control of physics-based models. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pp. 9–20. Cited by: §2.
- Learning to predict the cosmological structure formation. Proceedings of the National Academy of Sciences 116 (28), pp. 13825–13832. Cited by: §2.
- Learning to control pdes with differentiable physics. arXiv preprint arXiv:2001.07457. Cited by: §2.
- DiffTaichi: differentiable programming for physical simulation. arXiv preprint arXiv:1910.00935. Cited by: §2.
- A moving least squares material point method with displacement discontinuity and two-way rigid body coupling. ACM Trans. Graph. 37 (4). Cited by: §4.1.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §B.3, §4.3.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §D.1.
- Scale mlperf-0.6 models on google tpu-v3 pods. arXiv preprint arXiv:1909.09756. Cited by: §B.3.
- Data-driven fluid simulations using regression forests. ACM Transactions on Graphics (TOG) 34 (6), pp. 1–9. Cited by: §2.
- Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. arXiv preprint arXiv:1810.01566. Cited by: §D.2, §1, §2, §4.1, §5.5.
- Propagation networks for model-based control under partial observation. In 2019 International Conference on Robotics and Automation (ICRA), pp. 1205–1211. Cited by: §2.
- Unified particle physics for real-time applications. ACM Transactions on Graphics (TOG) 33 (4), pp. 1–12. Cited by: §4.1.
- Dynamic graph convolutional networks. Pattern Recognition 97, pp. 107000. Cited by: §2.
- Smoothed particle hydrodynamics. Annual review of astronomy and astrophysics 30 (1), pp. 543–574. Cited by: §2.
- Flexible neural representation for physics prediction. In Advances in Neural Information Processing Systems, pp. 8799–8810. Cited by: §2.
- Position based dynamics. Journal of Visual Communication and Image Representation 18 (2), pp. 109–118. Cited by: §2.
- Hamiltonian graph networks with ode integrators. arXiv preprint arXiv:1909.12790. Cited by: §2, §3.1, §6.
- Graph networks as learnable physics engines for inference and control. arXiv preprint arXiv:1806.01242. Cited by: §C.2, §2, §2, §5.5.
- The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §2.
- JAX, md: end-to-end differentiable, hardware accelerated, molecular dynamics in pure python. arXiv preprint arXiv:1912.04232. Cited by: §2.
- Application of a particle-in-cell method to solid mechanics. Computer physics communications 87 (1-2), pp. 236–252. Cited by: §2.
- Stochastic prediction of multi-agent interactions from partial observations. arXiv preprint arXiv:1902.09641. Cited by: §2.
- Relational forward models for multi-agent learning. arXiv preprint arXiv:1809.11044. Cited by: §2.
Know-evolve: deep temporal reasoning for dynamic knowledge graphs. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3462–3471. Cited by: §2.
- DyRep: learning representations over dynamic graphs. In International Conference on Learning Representations, External Links: Cited by: §2.
- Lagrangian fluid simulation with continuous convolutions. In International Conference on Learning Representations, External Links: Cited by: §B.3, §C.2, §D.1, §D.1, §D.1, §D.1, §D.1, §1, §2, §4.1, §5.5.
- Neural execution of graph algorithms. In International Conference on Learning Representations, External Links: Cited by: §2.
- Topics in optimal transportation. American Mathematical Soc.. Cited by: §B.4, §4.4.
- Latent space physics: towards learning the temporal evolution of fluid flow. In Computer Graphics Forum, pp. 71–82. Cited by: §2.
Spatial temporal graph convolutional networks for skeleton-based action recognition.
Thirty-second AAAI conference on artificial intelligence, Cited by: §2.
Appendix A Supplementary GNS model details
Update mechanism. Our GNS implementation here uses semi-implicit Euler integration to update the next state based on the predicted accelerations:
where we assume for simplicity. We use this in contrast to forward Euler () so the acceleration predicted by the model can directly influence .
Optimizing parameters of learnable simulator. Learning a simulator , can in general be expressed as optimizing its parameters over some objective function,
represents a distribution over state trajectories, starting from the initial conditions , over timesteps. The indicates the simulated rollout generated by given . The objective function , considers the whole trajectory generated by . In this work, we specifically train our GNS model on a one-step loss function, , with
This imposes a stronger inductive bias that physical dynamics are Markovian, and should operate the same at any time during a trajectory.
In fact, we note that optimizing for whole trajectories may not actually not be ideal, as it can allow the simulator to learn biases which may not be hold generally. In particular, an which considers the whole trajectory means does not necessarily equal the that would optimize . This is because optimizing a capacity-limited simulator model for whole trajectories might benefit from producing greater one-step errors at certain times, in order to allow for better overall performance in the long term. For example, imagine simulating an undamped pendulum system, where the initial velocity of the bob is always zero. The physics dictate that in the future, whenever the bob returns to its initial position, it must always have zero velocity. If cannot learn to approximate this system exactly, and makes mistakes on intermediate timesteps, this means that when the bob returns to its initial position it might not have zero velocity. Such errors could accumulate over time, and causes large loss under an which considers whole trajectories. The training process could overcome this by selecting which, for example, subtly encodes the initial position in the small decimal places of its predictions, which the simulator could then exploit by snapping the bob back to zero velocity when it reaches that initial position. The resulting
may be more accurate over long trajectories, but not generalize as well to situations where the initial velocity is not zero. This corresponds to using the predictions, in part, as a sort of memory buffer, analogous to a recurrent neural network.
Of course, a simulator with a memory mechanism can potentially offer advantages, such as being better able to recognize and respect certain symmetries, e.g., conservation of energy and momentum. An interesting area for future work is exploring different approaches for training learnable simulators, and allowing them to store information over rollout timesteps, especially as a function for how the predictions will be used, which may favor different trade-offs between accuracy over time, what aspects of the predictions are most important to get right, generalization, etc.
Appendix B Supplementary experimental methods
b.1 Physical domains
|Domain||Simulator||Dim.||# particles (approx)||Trajectory length||# trajectories (Train/ Validation/ Test)||[ms]||Connectivity radius|
b.2 Implementation details
Input and output representations. We define the input “velocity” as average velocity between the current and previous timestep, which is calculated from the difference in position, (omitting constant for simplicity). Similarly, we use as target the average acceleration, calculated between the next and current timestep, . Target accelerations are thus calculated as, .
We express the material type (water, sand, goop, rigid, boundary particle) as a particle feature, , represented with a learned embedding vector of size 16. For datasets with fixed flat orthogonal walls, instead of adding boundary particles, we add a feature to each node indicating the vector distance to each wall. Crucially, to maintain spatial translation invariance, we clip this distance to the connectivity radius , achieving a similar effect to that of the boundary particles. In FluidShake, particle positions were provided in the coordinate frame of the container, and the container position, velocity and acceleration were provided as 6 global features. In Continuous a single global scalar was used to indicate the friction angle of the material.
Building the graph. We construct the graph by, for each particle, finding all neighboring particles within the connectivity radius. We use a standard - tree algorithm for this search. The connectivity radius was chosen, such that the number of neighbors in roughly in the range of . We however did not find it necessary to fine-tune this parameter: All 2D scenes of the same resolution share , only the high-res 2D and 3D scenes, which had substantially different particle densities, required choosing a different radius. Note that for these datasets, the radius was simply chosen once based on particle neighborhood size adn total number of edges, and was not fine-tuned as a hyperparameter.
Neural network parametrizations. We also trained models where we replaced the deep encoder and decoder MLPs by simple linear layers without activations, and observed similar performance.
Noise injection. Because our models take as input a sequence of states (positions and velocities), we draw independent samples , for each input state, particle and spatial dimension, before each training step. We accumulate them across time as a random walk, and use this to perturb the stack of input velocities. Based on the updated velocities, we then adjust the position features, such that is maintained, for consistency. We also experimented with other types of noise accumulation, as detailed in Section C.
Another way to address differences in training and test input distributions is to, during training, provide the model with its own predictions by rolling out short sequences. Ummenhofer et al. (2020), for example, train with two-step predictions. However computing additional model predictions are more expensive, and in our experience may not generalize to longer sequences as well as noise injection.
Normalization. To normalize our inputs and targets, we compute the dataset statistics of during training. Instead of using moving averages, which could shift in cycles during training, we instead build exact mean and variance for all of the input and target particle features up seen up to the current training step , by accumulating the sum, the sum of the squares and the total particle count. The statistics are computed after noise is applied to the inputs.
Loss function and optimization procedures. We load the training trajectories sequentially, and use them to generate input and target pairs (from a 1000-step long trajectory we generate 995 pairs, as we condition on 5 past states), and sample input-target pairs from a shuffle buffer of size 10k. Rigid obstacles, such as the ramps in Water-Ramps, are represented as boundary particles. Those particles are treated identical to regular particles, but they are masked out of the loss. Similarly, in the ground truth simulations, particles sometimes clip through the boundary and disappear; we also mask those particles out of the loss.
Due to normalization of predictions and targets, our prediction loss is normalized, too. This allows us to choose a scale-free learning rate, across all datasets. To optimize the loss, we use the Adam optimizer Kingma and Ba (2014)
(a form of stochastic gradient descent), with minibatches of size 2 averaging the loss for all particles in the batch. We performed a maximum of 20M gradient update steps, with an exponentially decaying learning rate,, where on the -th gradient update step, , with and . While models can train in significantly less steps, we avoid aggressive learning rates to reduce variance across datasets and make comparisons across settings more fair.
We train our models using second generation TPUs and V100 GPUs interchangeably. For our datasets, we found that training time per example with a single TPU chip or a single V100 GPU was about the same. TPUs allowed for faster training through fast batch parallelism (each element of a batch on a separate TPU) and model parallelism (a single training distributed over multiple chips)Kumar et al. (2019)
. Furthermore, since TPUs require fixed size tensors, instead of just padding each device’s training example to a maximum number of nodes/edges, we set the fixed size to correspond to the largest graph in the dataset, and build a larger minibatch using multiple training examples whenever they would fit within the fixed sizes. This yielded an effective batch size between 1 (large examples) and 3 examples (small examples) per device and is equivalent to setting a mini batch size in terms of total number of particles per batch.
b.4 Distributional evaluation metrics
An MSE metric of zero indicates that the model perfectly predicts where each particle has traveled. However, if the model’s predicted positions for particles A and B exactly match true positions of particles B and A, respectively, the MSE could be high, even though the predicted and true distributions of particles match. So we also explored two metrics that are invariant under particle permutations, by measuring differences between the distributions of particles: optimal transport (OT) Villani (2003) using 2D or 3D Wasserstein distance and approximated by the Sinkhorn Algorithm Cuturi (2013), and maximum mean discrepancy (MMD) Gretton et al. (2012) with a Gaussian kernel bandwidth of . These distributional metrics may be more appropriate when the goal is to predict what regions of the space will be occupied by the simulated material, or when the particles are sampled from some continuous representation of the state and there are no “true” particles to compare predictions to. We will analyze some of our results using those metrics in Section C.
Appendix C Supplementary results
c.1 Architectural choices with minor impact on performance
We provided additional ablation scans on the Goop dataset in Figure C.1, which show that the model is robust to typical hyperparameter changes.
Number of input steps . (Figure C.2a,b) We observe a significant improvement from conditioning the model on just the previous velocity () to the two most recent velocities (), but find similar performance for larger values of . All our models use .
Number of MLP hidden layers. (Figure C.2c,d) Except for the case of zero hidden layers (linear layer), the performance does not chance much as a function of the MLP depth.
Use MLP encoder & decoder. (Figure C.2e,f) We replaced the MLPs used in the encoder and decoder by linear layers (single matrix multiplication followed by a bias), and observed no significant changes in performance.
Include self-edges. (Figure C.2g,h) The model performs similarly regardless of whether self-edges are included in the message passing process or not.
Use global latent. (Figure C.2i,j) We enable the global mechanisms of the GNs. This includes both, explicit input global features (instead of appending them to the nodes) and a global latent state that updates after every message passing step using aggregated information from all edges and nodes. We do not find significant differences when doing this, however we speculate that generalization performance for systems with more particles than used during training would be affected.
Use edges latent. (Figure C.2k,m) We disabled the updates to the latent state of the edges that is performed at each edge message passing iteration, but found no significant differences in performance.
Add gravity acceleration. (Figure C.2m,n) We attempted adding gravity accelerations to the model outputs in the update procedure, so the model would not need to learn to predict a bias acceleration due to gravity, but found no significant performance differences.
c.2 Noise-related training parameters
We provide some variations related to how we add noise to the input data on the Goop dataset in Figure C.2.
Noise type. (Figure C.2a,e) We experimented with 4 different modes for adding noise to the inputs. only_last adds noise only to the velocity of the most recent state in the input sequence of states. correlated draws a single per-particle and per-dimension set of noise samples, and applies the same noise to the velocities of all input states in the sequence. uncorrelated draws independent noise for the velocity of each input state. random_walk draws noise for each input state, adding it to the noise of the previous state in sequence as in a random random walk, as an attempt to simulate accumulation of error in a rollout. In all cases the input states positions are adjusted to maintain . To facilitate the comparison, the variance of the generated noise is adjusted so the variance of the velocity noise at the last step is constant. We found the best rollout performance for random_walk noise type.
Noise Std. (Figure C.2b,f) This is described in the main text, included here for completeness.
Reconnect graph after noise. (Figure C.2c,g) We found that the performance did not change regardless of whether we recalculated the connectivity of the graph after applying noise to the positions or not.
Fraction of noise to correct. (Figure C.2d,h) In the process of corrupting the input position and velocity features with noise, we do not adjust the target accelerations, that is, we do not ask the model to predict the accelerations that would correct the noise in the input positions. For this variation we modify the target average accelerations to force the model to predict accelerations that would also correct for 10%, 30% or 100% of the the noise in the inputs. This it implicitly what happens when the loss is defined directly on the positions, regardless of whether the inputs are perturbed with noise Sanchez-Gonzalez et al. (2018) or the inputs have noise due to model error Ummenhofer et al. (2020). Figure C.2d,h show that asking the model to correct for a large fraction of the noise leads to worse performance. We speculate that this is because physically valid distributions are very complex and smooth in these datasets, and unlike in the work by Sanchez-Gonzalez et al. (2018), once noise is applied, it is not clear which is the route the model should take to bring the state back to a valid point in the distribution, resulting in large variance during training.
c.3 Distributional evaluation metrics
Generally we find that MSE and the distributional metrics lead to generally similar conclusions in our analyses (see Figure C.3), though we notice that differences in the distributional metrics’ values for qualitatively “good” and “bad” rollouts can be more prominent, and match more closely with our subjective visual impressions. Figure C.4 shows the rollout errors as a function of the key architectural choices from Figure 4 using these distributional metrics.
c.4 Quantitative results on all datasets
|Mean Squared Error||Optimal Transport||Maximum Mean Discrepancy ()|
c.5 Example failure cases
In this video, we show two of the failure cases we sometimes observe with the GNS model. In the BoxBath domain we found that our model could accurately predict the motion of a rigid block, and maintain its shape, without requiring explicit mechanisms to enforce solidity constraints or providing the rest shape to the network. However, we did observe limits to this capability in a harder version of BoxBath, which we called FluidShake-Box, where the container is vigorously shaken side to side, over a rollout of 1500 timesteps. Towards the end of the trajectory, we observe that the solid block starts to deform. We speculate the reason for this is that GNS has to keep track of the block’s original shape, which can be difficult to achieve over long trajectories given an input of only 5 initial frames.
In the second example, a bad seed of our model trained on the Goop domain predicts a blob of goop stuck to the wall instead of falling down. We note that in the training data, the blobs do sometimes stick to the wall, though it tends to be closer to the floor and with different velocities. We speculate that the intricacies of static friction and adhesion may be hard to learn—to learn this behaviour more robustly, the model may need more exposure to fall versus sticking phenomena.
Appendix D Supplementary baseline comparisons
d.1 Continuous convolution (CConv)
Recently Ummenhofer et al. (2020) presented Continuous Convolution (CConv) as a method for particle-based fluid simulation. We show that CConv can also be understood in our framework, and compare CConv to our approach on several tasks.
While Ummenhofer et al. (2020) state that “Unlike previous approaches, we do not build an explicit graph structure to connect the particles but use spatial convolutions as the main differentiable operation that relates particles to their neighbors.”, we find we can express CConv (which itself is a generalization of CNNs) as a GN Battaglia et al. (2018) with a specific type of edge update function.
CConv relates to CNNs (with stride of 1) and GNs in two ways. First, in CNNs, CConv, and GNs, each element (e.g., pixel, feature vector, particle) is updated as a function of its neighbors. In CNNs the neighborhood is fixed and defined by the kernel’s dimensions, while in CConv and GNs the neighborhood varies and is defined by connected edges (in CConv the edges connect to nearest neighbors).
Second, CNNs, CConv, and GNs all apply a function to element ’s neighbors, , pool the results from within the neighborhood, and update element ’s representation. In a CNN, this is computed as, , where is a matrix whose parameters depend on the displacement between the grid coordinates of and , (and is a bias vector, and is a non-linear activation). Because there are a finite set of values, one for each coordinate in the kernel’s grid, there are a finite set of parameterizations.
CConv uses a similar formula, except the particles’ continuous coordinates mean a choice must be made about how to parameterize . Like in CNNs, CConv uses a finite set of distinct weight matrices, , associated with the discrete coordinates, , on the kernel’s grid. For the continuous input , the nearest are interpolated by the fractional component of . In 1D this would be linear interpolation, , where . In 3D, this is trilinear interpolation.
A GN can implement CNN and CConv computations by representing using edge attributes, , and an edge update function which uses independent parameters for each , i.e., . Beyond their displacement-specific edge update function, CNNs and CConv are very similar to how graph convolutional networks (GCN) Kipf and Welling (2016) work. The full CConv update as described in Ummenhofer et al. (2020) is, . In particular, it indexes into the weight matrices via a polar-to-Cartesian coordinate transform, , to induce a more radially homogeneous parameterization. It also uses a weighted sum over the particles in a neighborhood, where the weights, , are proportional to the distance between particles. And it includes a normalization, , for neighborhood size, they set it to 1.
We implemented the CConv model, loss and training procedure as described by Ummenhofer et al. (2020). For simplicity, we only tested the CConv model on datasets with flat walls, rather than those with irregular geometry. This way we could omit the initial convolution with the boundary particles and instead give the fluid particles additional simple features indicating the vector distance to each wall, clipped by the radius of connectivity, as in our model. This has the same spatial constraints as CConv with boundary particles in the wall, and should be as or more informative than boundary particles for square containers. Also, for environments with multiple materials, we appended a particle type learned embedding to the input node features.
To be consistent with Ummenhofer et al. (2020), we used their batch size of 16, learning rate decay of to for 50k iterations, and connectivity radius of 4.5x the particle radius. We were able to replicate their results on PBD/FLeX and SPH simulator datasets similar to the datasets presented in their paper. To allow a fair comparison when evaluating on our MPM datasets, we performed additional hyperparameter sweeps over connectivity particle radius, learning rate, and number of training iterations using our Goop dataset. We used the best-fitting parameters on all datasets, analogous to how we selected hyperparameters for our GNS model.
We also implemented variations of CConv which used noise to corrupt the inputs during training (instead of using 2-step loss), as we did with GNS. We found that the noise improved CConv’s rollout performance on most datasets. In our comparisons, we always report performance for the best-performing CConv variant.
Our qualitative results show CConv can learn to simulate sand reasonably well. But it struggled to accurately simulate solids with more complex fine grained dynamics. In the BoxBath domain, CConv simulated the fluid well, but struggled to keep the box’s shape intact. In the Goop domain, CConv struggled to keep pieces of goop together and handle the rest state, while in MultiMaterial it exhibited local “explosions”, where regions of particles suddenly burst outward (see video).
Generally CConv’s performance is strongest for simulating water-like fluids, which is primarily what it was applied to in the original paper. However it still did not match our GNS model, and for other materials, and interactions between different materials, it was clearly not as strong. This is not particularly surprising, given that our GNS is a more general model, and our neural network implementation has higher capacity on several axes, e.g., more message-passing steps, pairwise interaction functions, more flexible function approximators (MLPs with multiple internal layers versus single linear/non-linear layers in CConv).
We trained our model on the Li et al. (2018)’s BoxBath dataset, and directly compared the qualitative behavior to the authors’ demonstration video in this comparison. To provide a fair comparison to DPI, we show a model conditioned on just the previous velocity (=1) in the above comparison video666We also ran experiments with C=5 and did not find any meaningful difference in performance. The results in Table 1 and the corresponding example video are run with C=5 for consistency with our other experiments. While DPI requires a specialized hierarchical mechanism and forced all box particles to preserve their relative displacements with each other, our GNS model faithfully represents the the ground truth trajectories of both water and solid particles without any special treatment. The particles making up the box and water are simply marked as a two different materials in the input features, similar to our other experiments with sand, water and goop. We also found that our model seems to also be more accurate when predicting the fluid particles over the long rollout, and it is able to perfectly reproduce the layering effect for fluid particles at the bottom of the box that exists in the ground truth data.