In the 1894 paper “On the symmetry of physical phenomena”, Pierre Curie articulated the following Curie (1894); Chalmers (1970): “When certain effects show certain asymmetry, this asymmetry must be found in the causes that gave rise to them.”
This observation, known as Curie’s principle, is as useful now as it was at the turn of the 20 century. Many physical phenomena are now understood to be consequences of symmetry breaking Anderson (1972): the mechanism that generates mass Englert and Brout (1964); Higgs (1964); Guralnik et al. (1964), superconductivity Nambu (1960)
, phase transitions leading to ferroelectricityLandau (1937), and many others. Identifying sources of symmetry and symmetry breaking will undoubtedly continue to play a vital role in physicists’ endeavors to model complex physical systems.
Machine learning techniques such as neural networks are data driven methods for building models that have been successfully applied to many areas of physics, particularly fields that employ computationally expensive calculations and simulations, such as quantum matter, particle physics, and cosmology Carleo et al. (2019); Schütt et al. (2020). An important consideration in building machine learned models of physical processes is how to incorporate the strong axioms we have about the symmetry properties of physical systems. Building these axioms into the data featurization, training method, or the model itself prevents the model from learning undesirable and unphysical bias that violates these axioms Grisafi et al. (2018); Behler and Parrinello (2007); Bartók et al. (2013); T. et al. (2018); Thomas et al. (2018); Kondor et al. (2018); Weiler et al. (2018); Anderson et al. (2019). An important feature of these models, that we prove in this Letter, is that they are unable to fit data that is not compatible by symmetry (e.g. the input is higher symmetry than the output). If we hope to use learned models to gain physical insight, this is a crucial feature not a bug; many discoveries in physics have been made when symmetry implied something was missing (e.g. the first postulation of the neutrino by Pauli Brown (1978)).
In this Letter, we show how we can harness both the flexibility of neural networks and the rigor of symmetry to identify when our data (input and output) are not compatible by symmetry (violating Curie’s principle). We prove that neural networks that automatically have the same symmetry properties as physical systems, symmetry equivariant neural networks, exhibit Curie’s Principle. Thus, symmetry equivariant neural networks can be used as “symmetry compilers”: These model will be unable to preferentially fit to output that is not symmetrically compatible with the input; it will instead automatically weigh all symmetrically degenerate possibilities equally. Furthermore, we can use gradients of error between predicted and known output to determine the form (representations) of symmetry breaking information missing from the input data.
We organize this paper as follows: First, we provide background to how these symmetry equivariant networks are constructed. Then, we prove the symmetry properties of the output and gradients of Euclidean symmetry equivariant neural networks. Finally, we demonstrate these properties numerically by training a Euclidean neural network to deform a square into a rectangle. To conclude, we discuss the applicability of these techniques to more complex examples.
A neural network is a function
that maps vector spaceto another vector space parameterized by weights , i.e.
. The performance of the neural network is evaluated by use of a loss function. Common loss functions are the mean absolute error (MAE) and mean squared error (MSE). The weightsare updated by taking gradients of the loss with respect to , where is the learning rate. It is also possible to compute gradients of with respect to the input , which we will use in our final experiment.
In this work, we use Euclidean neural networks which are parameterizable functions that are equivariant to elements of 3D Euclidean symmetry (3D rotations, 3D translations, inversion, and any composition of those operations such as mirrors, screws, and glides). A function is equivariant under group if for the group representation and acting on vector space and , respectively, if , for and .
If an object or pattern can be identified by a Euclidean neural network in one orientation, it is guaranteed to be identified with the same accuracy in another orientation. This general class of networks has been explored by multiple groups Thomas et al. (2018); Kondor et al. (2018); Weiler et al. (2018)et al. (2017); Kondor and Trivedi (2018); Cohen et al. (2019).
The success of convolutional neural networks at a variety of geometric tasks (whether the inputs are images, point clouds, or graphs) is due to them having translation equivariance (e.g. if an pattern can be identified in one location by the filters of the network, it will also be identified if it appears in another location). Euclidean neural networks are a subset of convolutional neural networks where the filters are constrained to be equivariant to 3D rotations. To accomplish this, the filter functions are defined to be separable into a learned radial function and spherical harmonics, , analogous to the separable nature of the hydrogenic wavefunctions. For simplicity and without loss of generality, in this work, we use real spherical harmonics.
An additional consequence of Euclidean equivariance is that all “tensors” in a Euclidean neural network are geometric tensors and we must combine input and filter geometric tensors according to the rules of tensor algebra, using Clebsch-Gordon coefficients or Wigner 3j symbols (they are equivalent) to contract representation indices. We choose to express these geometric tensors in an irreducible representation basis and use spherical harmonics compatible with our chosen conventions.
For the experiments in this letter, we use continuous convolutions over points set geometries. Unlike most convolutional neural networks, our filter functions are defined over 3D space, not on a pre-specified grid. These methods and results generalize to images. To conduct our experiments, we use the e3nn framework Geiger et al. (2020)
for 3D Euclidean equivariant neural networks in this work written with PyTorchpytorch Paszke et al. (2019). The jupyter Kluyver et al. (2016) notebooks used for running the experiments and creating the figures for this letter are made available at Ref. Smidt under the heading Simple Tasks and Symmetry.
Iii Mathematical formulation
In this section, we prove symmetry properties of the gradients of a -invariant scalar loss (such as the MSE loss) evaluated on the output of a -equivariant neural network and ground truth data , e.g. . In the section Experiments, we demonstrate that these properties allow us to learn symmetry breaking changes to the input that allow us to find “missing data” implied by symmetry.
For a group and a representation , the symmetry group of is defined as
Note that is a subgroup of , because and it is stable because . Thus, is not necessarily equal to . For instance, consider a square at the origin acted on by the representation of . The symmetry group of the square is the dihedral group, a subgroup of SO(3). For contrast, now consider another square rotated with respect to the first one, its symmetry group is also the dihedral group; however, they are not identical since they are related by a rotation. ; they are two different groups. Instead we have ; these two groups are isomorphic.
A function is equivariant to group (or -equivariant) if and only if there exists two representations of , and such that
If is the trivial representation (), then the function is said to be invariant to , which is a special case of equivariance.
Let be equivariant to group . We first want to prove that
Proof: For (i.e. ),
Next, we want to prove
Proof: For ,
Finally, we prove if is a differentiable and invariant function, then , is equivariant to
Proof: For , let where the set of is an orthonormal basis. We first, recall the definition of the derivative
and the following equalities for how the vector and its basis transform under the representation
To show the equivariance of the gradient,
|by the invariance of|
|by the differentiability property|
To recapitulate, for an -equivariant function, the symmetry of the output has equal or higher symmetry than the input. When training a neural network, one uses a loss function to compute gradients of the loss with respect to network parameters or alternatively to the the input (data) to the network. If this loss is a -invariant function and the network is an -equivariant function, then the gradients are -equivariant. Thus, the symmetry of the gradients have equal or higher symmetry than the input to the loss function. If the input to the loss function is a linear combination (such as the difference) of the network and ground truth output, the symmetry group the gradients will be a superset of the intersection of the symmetry groups of the network and ground truth output.
Finally, if the symmetry of the ground truth output is lower than the input to the network, the gradients can have symmetry lower than the input, allowing for the use of gradients to update the input to the network to make the network input and output symmetrically compatible. This procedure can be used to find symmetry breaking order parameters missing in the original data but implied by symmetry.
Iv Symmetry and interpretation of network input and output
3D space has Euclidean symmetry, denoted as . Objects (e.g. geometry and geometric tensor fields) in 3D space lower that symmetry to a subgroup of , as defined by Eqn. 1. The fact that Euclidean neural networks are equivariant to all Euclidean symmetry operations implies that the network will preserve any subgroup of Euclidean symmetry: point groups, space groups, subperiodic groups, and other subgroups that are not traditionally tabulated.
Properties in Euclidean space take the form of geometric tensor fields, geometric tensors defined over space (or on a specific geometric). These are also the input to Euclidean neural networks. In practice, we articulate this as two separate inputs, the point geometry of our system expressed as 3D coordinates (which is used by the convolutional filters) and features on that geometry (which is producted with convolutional filters). We express these tensors and our convolutional filters in an irreducible basis of ; the irreducible representations of are indexed by angular frequency
and how they transform under parity (odd or even parity). Some examples of input features would be scalars such as mass or charge which transform in the same manner aswith even parity (no change under inversion) or vectors such as velocity or acceleration which transform in the same manner as with odd parity (all components change by factor of -1 under inversion).
Geometric tensors can represent many things; numerical properties (e.g mass, velocity, and elasticity) are the most familiar. We can also use geometric tensors to express spatial functions.
A function on the sphere can be projected onto the spherical harmonics (typically up to some maximum ),
, are the spherical harmonics. This is the angular equivalent of a Fourier transform.
The coefficients of this projection form a geometric tensor in the irreducible basis. We can additionally encode radial information in this projection by either adding radial functions to the projection procedure or interpreting the magnitude of the function on the sphere as a radial distance from the origin.
For example, we can project a local point cloud onto spherical harmonics at a specified origin and store the projection as a feature on a point; this is a common step in calculating rotation invariant descriptors of local atomic environments Bartók et al. (2013). In this work, to project a local point cloud onto a specified origin, we treat the point cloud as a set of functions at corresponding angles around the origin and weigh the projection of each point by its radial distance from the origin.
We additionally re-scale this signal to account for finite basis effects by ensuring the max of the function corresponds to the original radial distance (). See Figure 1 where this method is used to project the vertices of a tetrahedron onto an origin; the function magnitude is plotted to be proportional to radial distance. Because these projection coefficients form irrep tensors at specified locations in spaces (the projection origins), they can be the input or output of a Euclidean neural network.
v.1 Two simple tasks. One learnable; one not.
According to Eqn. 3, since Euclidean neural networks are equivariant to Euclidean symmetry, the symmetry of the output can only be of equal or higher symmetry than the input. To demonstrate this, we train two neural networks to deform two arrangements of points in the plane into one another, one with four points at the vertices of a square, and another with four points at the vertices of a rectangle.
We interpret our network output as the projection of where we want each point to move. The procedure for generating these projections is described in the previous section. The predicted displacements for each point are articulated as a spherical harmonic signal and trained to match the spherical harmonic projection (for ) of the desired displacement vector or final point location.
We could have alternatively used an output to indicate a 3D vector displacement, but as we will show, the spherical harmonic signal is more informative in cases where the fit is poor.
First, we train a neural network to deform the rectangle into the square. The network is able to accomplish this quickly and accurately. Second, we train another neural network to deform the square into the rectangle. No matter the amount of training, the network cannot accurately perform the desired task. This is because a square is higher symmetry than a rectangle, with symmetries of point group and , respectively. By Eqn. 3, the output of the network has to have equal or higher symmetry than the input.
In Figure 2, we show output of the trained networks for both cases. On the right, we see that the model trained to deform the square into the rectangle is producing symmetric spherical harmonic signals each with two maxima. Due to being rotation equivariant and the input having the symmetry of a square, the network cannot distinguish distorting the square to form a rectangle aligned along the axis from a rectangle along the axis. The best prediction it can make is to provide an averaged output for both outcomes. Had we articulated our displacements as vectors, the output in the high symmetry case would be the average of degenerate displacement vectors, in this case, a vector with zero magnitude.
v.2 Fixing task two with symmetry breaking
When going from the point group to , 8 symmetry operations are lost – two four-fold axes, two two-fold axes, two improper four-fold axes, and two mirror planes. In the character table for , there is a 1-dimensional irreducible representation that breaks all these symmetries, , which has an basis function in the coordinate system of the axis being along the highest symmetry axis and and aligned with two of the mirror planes. To lower the symmetry of the square to , we add a non-zero contribution to the component (proportional to ) of all the point features. When this term is added, the model is immediately able to learn the task with equal accuracy as a model trained to distort the rectangle into the square.
v.3 Learning Symmetry Breaking Input
The situation of having a dataset where the “inputs” are higher symmetry that the “outputs” is one that has occurred many times in scientific history when there is missing data – an asymmetry in the system waiting to be discovered. For example, neutrinos were first postulated by Pauli as undetected particles that would account for missing angular momentum and energy in measurements of the process of beta decay Brown (1978).
In the context of phase transitions as described by Landau theory Landau (1937), symmetry-breaking factors are called order parameters. The process of determining the representation of symmetry breaking quantities can be nontrivial, particularly for complex systems.
To perform backpropogation, a neural network is required to be differentiable, such that gradients of the loss can be taken with respect to every parameter in the model. This technique can be extended to the input.
In this example, we require the input to be the same on each point. Additionally, because we want to recover the minimal input needed to break symmetry, we add a component-wise mean absolute error (MAE) loss on each components of the input feature to encourage sparsity. It is important to note that we only apply the component-wise MAE loss to the input and not the network output. While the component-wise MSE loss is rotation invariant, the component-wise MAE loss is not and encouraging sparsity of the symmetry breaking parameter in one coordinate frame does not guarantee sparsity in general. We have chosen to train the network in the coordinate frame that matches the conventions of point group tables, so the irreducible representation basis functions can be directly compared.
In this task, the setup is identical to task two, but we modify the training procedure. We first train the model normally until the loss no longer improves. Then we alternate between updating the parameters of the model and updating the input using backpropogation of the loss.
As the loss converges, we find that the input for consists of non-zero order parameters comprising only components that transform as the irrep , such as the spherical harmonic , proportional to , and the spherical harmonic , proportional to . See Fig. 3 for images of the evolution of the input and output signals during the model and order parameter optimization process. Modulo an arbitrary sign that depends on initialization, the resulting order parameter matches the that we found in the previous section (using the character table of point group and the knowledge of which symmetries we wanted to break). Additionally, we find that the spherical harmonic also transforms as , which can be confirmed with simple calculations.
We have used an archetypal example to demonstrate that symmetry equivariant neural networks exhibit Curie’s principle: the output of the neural network must have equal or higher symmetry than the symmetry of the input. Using this property, we can uncover missing symmetry breaking information that must exist due to Curie’s principle but is not provided in the data. We emphasize that while our experiments were specific to Euclidean neural networks, our results generalize to other symmetry equivariant neural networks such as permutation equivariant neural networks which are a subset of graph convolutional neural networks.
Euclidean symmetry equivariant neural networks provide a systematic way of finding symmetry breaking order parameters of arbitrary isotropy subgroups of without any explicit knowledge of the symmetry of the given data. We can even find order parameters that satisfy certain conditions by articulating those conditions in how we construct the input and loss function.
While our example involves only the most elementary of point groups, these methods can be applied to arbitrary geometric tensor fields. For these networks, there is no computational difference between treating these cases, whereas traditionally to arrive at these symmetry insights, one must derive character tables, compatibility relationships, and functional forms of irreducible representations from scratch.
The same procedures demonstrated in this paper can be used to find order parameters of real physical systems: phonon modes of structural phase transitions in crystalline systems (e.g. order parameters describing the octahedral tilting of perovskites), missing environmental parameters of an experimental setup (e.g. anisotropies in the magnetic field of an accelerator magnet), or identifying other undetected or otherwise missing quantities necessary to preserve symmetry.
As useful as symmetry is, symmetry is a challenging tool to master. For example, using symmetry in the context of crystallography requires being aware of many conventions and understanding advanced language and concepts for describing symmetry relations; such information is tabulated in the growing 8 volumes of The International Tables of Crystallography Brock et al. (2006). Euclidean neural networks have symmetry built in; this allows for symmetry aware operations to be encoded without expert knowledge in representation theory, expanding the accessibility of these mathematical principles.
Acknowledgements.T.E.S. thanks Sean Lubner, Josh Rackers, and Sinéad Griffin for many helpful discussions. T.E.S. and M.G. were supported by the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory and B.K.M. was supported by CAMERA both under U.S. Department of Energy Contract No. DE-AC02-05CH11231.
- Cormorant: covariant molecular neural networks. In Advances in Neural Information Processing Systems, pp. 14537–14546. Cited by: §I.
- More is different. Science 177 (4047), pp. 393–396. External Links: Cited by: §I.
- On representing chemical environments. Phys. Rev. B 87, pp. 184115. External Links: Cited by: §I, §IV.
- Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, pp. 146401. External Links: Cited by: §I.
- International tables for crystallography. International Union of Crystallography. External Links: Cited by: §VI.
- The idea of the neutrino. Physics Today 31 (9), pp. 23–28. External Links: Cited by: §I, §V.3.
- Machine learning and the physical sciences. Rev. Mod. Phys. 91, pp. 045002. External Links: Cited by: §I.
- Curie’s principle. The British Journal for the Philosophy of Science 21 (2), pp. 133–148. Cited by: §I.
- A general theory of equivariant cnns on homogeneous spaces. In Advances in Neural Information Processing Systems 32, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett (Eds.), pp. 9142–9153. Cited by: §II.
- Sur la symétrie dans les phénomènes physiques, symétrie d’un champ électrique et d’un champ magnétique. Journal de Physique Théorique et Appliquée 3 (1), pp. 393–415. External Links: Cited by: §I.
- Broken symmetry and the mass of gauge vector mesons. Phys. Rev. Lett. 13, pp. 321–323. External Links: Cited by: §I.
- Github.com/e3nn/e3nn External Links: Cited by: §II.
- Symmetry-adapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 120, pp. 036002. External Links: Cited by: §I.
- Global conservation laws and massless particles. Phys. Rev. Lett. 13, pp. 585–587. External Links: Cited by: §I.
- Broken symmetries and the masses of gauge bosons. Phys. Rev. Lett. 13, pp. 508–509. External Links: Cited by: §I.
- Jupyter notebooks – a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas, F. Loizides and B. Schmidt (Eds.), pp. 87 – 90. Cited by: §II.
- Clebsch–gordan nets: a fully fourier space spherical convolutional neural network. In Advances in Neural Information Processing Systems 31, pp. 10117–10126. Cited by: §I, §II.
- On the generalization of equivariance and convolution in neural networks to the action of compact groups. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, J. G. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, pp. 2752–2760. External Links: Cited by: §II.
- On the theory of phase transitions. Zh. Eksp. Teor. Fiz. 7, pp. 19–32. Cited by: §I, §V.3.
- Quasi-particles and gauge invariance in the theory of superconductivity. Phys. Rev. 117, pp. 648–663. External Links: Cited by: §I.
- PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. Cited by: §II.
- Machine learning meets quantum physics. Springer International Publishing. External Links: Cited by: §I.
-  e3nn_tutorial: a tutorial for e3nn, a modular framework for euclidean equivariant neural networks. Note: https://github.com/blondegeek/e3nn_tutorial Cited by: §II.
- SchNet – a deep learning architecture for molecules and materials. The Journal of Chemical Physics 148 (24), pp. 241722. External Links: Cited by: §I.
- Tensor field networks: rotation- and translation-equivariant neural networks for 3d point clouds. External Links: Cited by: §I, §II.
- 3D steerable cnns: learning rotationally equivariant features in volumetric data. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems, NIPS’18, USA, pp. 10402–10413. Cited by: §I, §II.
- Harmonic networks: deep translation and rotation equivariance. In , Cited by: §II.