Learning Symbolic Physics with Graph Networks

09/12/2019 ∙ by Miles D. Cranmer, et al. ∙ Google Princeton University Simons Foundation 0

We introduce an approach for imposing physically motivated inductive biases on graph networks to learn interpretable representations and improved zero-shot generalization. Our experiments show that our graph network models, which implement this inductive bias, can learn message representations equivalent to the true force vector when trained on n-body gravitational and spring-like simulations. We use symbolic regression to fit explicit algebraic equations to our trained model's message function and recover the symbolic form of Newton's law of gravitation without prior knowledge. We also show that our model generalizes better at inference time to systems with more bodies than had been experienced during training. Our approach is extensible, in principle, to any unknown interaction law learned by a graph network, and offers a valuable technique for interpreting and inferring explicit causal theories about the world from implicit knowledge captured by deep learning.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Discovering laws through observation of natural phenomenon is the central challenge of the sciences. Modern deep learning also involves discovering knowledge about the world but focuses mostly on implicit knowledge representations, rather than explicit

and interpretable ones. One reason is that the goal of deep learning is often optimizing test accuracy and learning efficiency in narrowly specified domains, while science seeks causal explanations and general-purpose knowledge across a wide range of phenomena. Here we explore an approach for imposing physically motivated inductive biases on neural networks, training them to predict the dynamics of physical systems and interpreting their learned representations and computations to discover the symbolic physical laws which govern the systems. Moreover, our results also show that this approach improves the generalization performance of the learned models.

The first ingredient in our approach is the “graph network” (GN) (Battaglia et al., 2018), a type of graph neural network (Scarselli et al., 2009; Bronstein et al., 2017; Gilmer et al., 2017), which is effective at learning the dynamics of complex physical systems (Battaglia et al., 2016; Chang et al., 2016; Sanchez-Gonzalez et al., 2018; Mrowca et al., 2018; Li et al., 2018; Kipf et al., 2018)

. We impose inductive biases on the architecture and train models with supervised learning to predict the dynamics of 2D and 3D n-body gravitational systems and a hanging string. If the trained models can accurately predict the physical dynamics of held-out test data, we can assume they have discovered some level of general-purpose physical knowledge, which is implicitly encoded in their weights. Crucially, we recognize that the forms of the graph network’s message and pooling functions have correspondences to the forms of force and superposition in classical mechanics, respectively. The message pooling is what we call a “linearized latent space:” a vector space where a latent representation of the interactions between bodies (forces or messages) are linear (summable). By imposing our inductive bias, we encourage the GN’s linearized latent space to match the true one.

The second ingredient is using symbolic regression — we use eureqa from Schmidt and Lipson (2009) — to fit compact algebraic expressions to a set of inputs and messages produced by our trained model. eureqa works by randomly combining mathematical building blocks such as mathematical operators, analytic functions, constants, and state variables, and iteratively searches the space of mathematical expressions to find the model that best fits a given dataset. The resulting symbolic expressions are interpretable and readily comparable with physical laws.

The contributions of this paper are:

  1. [leftmargin=30pt,labelindent=5pt,itemindent=0pt,noitemsep,nosep]

  2. A modified GN with inductive biases that promote learning general-purpose physical laws.

  3. Using symbolic regression to extract analytical physical laws from trained neural networks.

  4. Improved zero-shot generalization to larger systems than those in training.

2 Model

Graph networks are a type of deep neural network which operates on graph-structured data. The format of the graphs on which GNs operate is defined as 3-tuples111We adhere closely to the notation used in Battaglia et al. (2018) to formalize our model., , where:

  1. [label=,leftmargin=30pt,labelindent=5pt,itemindent=-15pt,noitemsep,nosep]

  2. is a global attribute vector of length ,

  3. is a set of node attribute vectors, of length , and

  4. is a set of edge attribute vectors, of length , and indices of the “receiver” and “sender” nodes connected by the -th edge.

Our GN implementation is depicted in fig. 1. Note: it does not include global and edge attributes. This GN processes a graph by first computing pairwise interactions, or “messages”, , between nodes connected by edges, with a “message function”, . Next, the set of messages incident on each -th receiver node are pooled into , where , and is a permutation-invariant operation which can take variable numbers of input vectors, such as elementwise summation. Finally, the pooled messages are used to compute node updates, , with a “node update function”, . Our specific architectural implementation is very similar to the “interaction network” (IN) variant (Battaglia et al., 2016).

Figure 1: A schematic depicting how we extract physical knowledge from a GN.

The forms of , , , and the associated input and output attribute vectors have correspondences to Newton’s formulation of classical mechanics, which motivated the original development of INs. The key observation is that could learn to correspond to the force vector imposed on the -th body due to its interaction with the -th body. In our examples, the force vector is equal to the derivative of the Lagrangian: , and this could be generally imposed if one knows and manually integrates the ODE with the output of the graph net. In a general n-body gravitational system in dimensions, note that the forces are minimally represented in an vector space. Thus, if , we exploit the GN’s “linearized latent space” for physical interpretability: we encourage to be the force.

We sketch a non-rigorous proof-like demonstration of our hypothesis. Newtonian mechanics prescribes that force vectors, , can be summed to produce a net force, , which can then be used to update the dynamics of a body. Our model uses the -th body’s pooled messages, , to update the body’s velocity via Euler integration, . If we assume our trained GN is trained to predict velocity updates perfectly for any number of bodies, this means , where . We have the result for a single interaction: . Thus, we can sub into the multi-interaction case: , and so

has to be a linear transformation. Therefore, for cases where

is invertible (mapping between the same dimensional space), , and so the message vectors are linear transformations of the true forces when . We demonstrate this hypothesis on trained GNs in section 3.

3 Experiments

We set up 100,000 random simulations for both a and force law in 2D, a law in 3D, and a string with an force law between nodes in 2D with a global gravity, for 1000 time steps each. The three n-body problems have six bodies in their training set, and the string has ten nodes, of which the two end nodes are fixed. We train a GN on each of these problems where we choose

, i.e., the length of the message vectors in the GN matches the dimensionality of the force vectors: 2 for the 2D simulations and 3 for the 3D simulations. Our GN, a pure TensorFlow

(Abadi et al., 2015) model, has both and

as three-hidden-layer multilayer perceptrons (MLPs) with 128 hidden nodes per layer with ReLU activations. We optimize the L1 loss between the predicted velocity update and the true velocity update of each node.

Once we have a trained model, we record the messages, , for all bodies over 1000 new simulations for each environment. We fit a linear combination of the vector components of the true force, , to each component of , as can be seen in fig. 2 for . The results for each system show that the vectors have learned to be a linear combination of the components when . We similar linear relations for all other simulations.

Figure 2: These plots demonstrate that the graph network’s messages have learned to be linear transformations of the two vector components of the true force: and , for the law in 2D.

We are also able to find the force law when it is unknown by using symbolic regression to fit an algebraic function that approximates . We demonstrate this on the trained GN for the problem using eureqa from Schmidt and Lipson (2009) to fit algebraic equations that fit the message. We allow it to use algebraic operators as well as input variables ( and for component separation, for distance, and for sending body mass) and real constants, with complexity for each of these operators and values. This returns a list of the models with the lowest mean square error at each complexity. We parametrize Occam’s razor to find the “best” algebraic model by taking the model that maximizes the fractional drop in mean square error over the best model at a lower complexity. The “best” model found by the symbolic regression for the first output element of is , which is a linear combination of the components of the true force, . We can see this is approximately the same linear transformation as the components in the left plot of fig. 2, but this algebraic expression was learned from scratch.

We now test whether the GN will generalize to more nodes better than a GN with a larger . This is because it is possible for a GN to “cheat” with a high dimension message-passing space, trained on a fixed number of bodies. One example of cheating would be for to concatenate each sending node’s properties along the message, and to calculate forces from these and add them. When a new body is added, this calculation might break. While it is still possible for to develop an elaborate encoding scheme with to cheat at this problem, it seems more natural for to learn the true force when and therefore show improved generalization to a greater number of nodes.

Figure 3: These plots demonstrate the improvement in generalization from minimizing the message passing space. The loss of GNs with different message-passing space dimension (), trained on a 6-body and 4-body system, in the left and right plots, respectively (indicated by the vertical line), are tested on a variable number of bodies in a simulation in 3D.

We test the hypothesis of better generalization with in fig. 3, by training GNs with different on the 3D simulations. The observed trend is that systems with see their loss blow up with a larger number of bodies — presumably because they have “cheated” slightly and not learned the force law in but in a combination of and , whereas the systems’ has learned a projection of the true forces and is able to generalize better for greater number of bodies. A conclusion of this may be that one can optimize GNs by minimizing to the known minimum dimension required to transmit information (e.g., for 3D forces), or, if this dimension is unknown, until the loss drops off.

4 Conclusion

We have demonstrated an approach for imposing physically motivated inductive biases on graph networks to learn interpretable representations and improved zero-shot generalization. We have shown through experiment that our graph network models which implement this inductive bias can learn message representations equivalent to the true force vector for n-body gravitational and spring-like simulations in 2D and 3D. We also have demonstrated a generic technique for finding an unknown force law: symbolic regression models to fit explicit algebraic equations to our trained model’s message function. Because GNs have more explicit sub-structure than their more homogeneous deep learning relatives (e.g., plain MLPs, convolutional networks), we can draw more fine-grained interpretations of their learned representations and computations. Finally, we have demonstrated that our model generalizes better at inference time to systems with more bodies than had been experienced during training.

Acknowledgments: Miles Cranmer and Rui Xu thank Professor S.Y. Kung for insightful suggestions on early work, as well as Zejiang Hou for his comments on an early presentation. Miles Cranmer would like to thank David Spergel for advice on this project, and Thomas Kipf, Alvaro Sanchez, and members of the DeepMind team for helpful comments on a draft of this paper.