# Equivariant Manifold Flows

Tractably modelling distributions over manifolds has long been an important goal in the natural sciences. Recent work has focused on developing general machine learning models to learn such distributions. However, for many applications these distributions must respect manifold symmetries – a trait which most previous models disregard. In this paper, we lay the theoretical foundations for learning symmetry-invariant distributions on arbitrary manifolds via equivariant manifold flows. We demonstrate the utility of our approach by using it to learn gauge invariant densities over SU(n) in the context of quantum field theory.

## Authors

• 9 publications
• 4 publications
• 5 publications
• 4 publications
• 40 publications
• 38 publications
• ### Isometric Multi-Manifolds Learning

Isometric feature mapping (Isomap) is a promising manifold learning meth...
12/03/2009 ∙ by Mingyu Fan, et al. ∙ 0

• ### Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows

Normalizing flows are generative models that provide tractable density e...
06/09/2021 ∙ by Brendan Leigh Ross, et al. ∙ 0

• ### Disentangling by Subspace Diffusion

We present a novel nonparametric algorithm for symmetry-based disentangl...
06/23/2020 ∙ by David Pfau, et al. ∙ 0

• ### Multi-chart flows

We present Multi-chart flows, a flow-based model for concurrently learni...
06/07/2021 ∙ by Dimitris Kalatzis, et al. ∙ 0

• ### Rectangular Flows for Manifold Learning

Normalizing flows are invertible neural networks with tractable change-o...
06/02/2021 ∙ by Anthony L. Caterini, et al. ∙ 0

• ### Horizontal Flows and Manifold Stochastics in Geometric Deep Learning

We introduce two constructions in geometric deep learning for 1) transpo...
09/13/2019 ∙ by Stefan Sommer, et al. ∙ 6

• ### Equivariant Hamiltonian Flows

This paper introduces equivariant hamiltonian flows, a method for learni...
09/30/2019 ∙ by Danilo Jimenez Rezende, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

footnotetext: * indicates equal contribution

Learning probabilistic models for data has long been the focus of many problems in machine learning and statistics. Though much effort has gone into learning models over Euclidean space (Goodfellow et al., 2014; Chen et al., 2018; Grathwohl et al., 2019), less attention has been allocated to learning models over non-Euclidean spaces, despite the fact that many problems require a manifold structure. Density learning over non-Euclidean spaces has applications ranging from quantum field theory in physics (Wirnsberger et al., 2020)

to motion estimation in robotics

(Feiten et al., 2013) to protein-structure prediction in computational biology (Hamelryck et al., 2006).

Continuous normalizing flows (CNFs) (Chen et al., 2018; Grathwohl et al., 2019) are powerful generative models for learning structure in complex data due to their tractability and theoretical guarantees. Recent work (Lou et al., 2020; Mathieu and Nickel, 2020) has extended the framework of continuous normalizing flows to the setting of density learning on Riemannian manifolds. However, for many applications in the natural sciences, this construction is insufficient as it cannot properly model necessary symmetries. For example, such symmetry requirements arise when sampling coupled particle systems in physical chemistry (Köhler et al., 2020) or sampling for 111 denotes the special unitary group . lattice gauge theories in theoretical physics (Boyda et al., 2020).

More precisely, these symmetries are invariances with respect to action by an isometry subgroup of the underlying manifold. For example, consider the task of learning a density on the sphere that is invariant to rotation around an axis; this is an example of learning an isometry subgroup invariant222This specific isometry subgroup is known as the isotropy group at a point of the sphere intersecting the axis. density. For a less trivial example, note that when learning a flow-based sampler for in the context of lattice QFT Boyda et al. (2020), the learned density must be invariant to conjugation by (see Figure 1 for a density on that exhibits the requisite symmetry).

Note that the quotient of the manifold by the isometry subgroup induces a natural manifold structure. Can an invariant density be modelled by learning over this quotient with a general manifold density learning method such as NMODE Lou et al. (2020)? Though this seems plausible, it is a problematic approach for several reasons:

1. First, it is often difficult to realize necessary constructs (charts, exponential maps, tangent spaces) on the quotient manifold (e.g. this is the case for , a quotient of Lee (2013)).

2. Second, even if the above constructs can be realized, the quotient manifold often has a boundary, which precludes the use of a manifold CNF. To illustrate this point, consider the simple case of the sphere invariant to rotation about an axis; the quotient manifold is a closed interval, and a CNF would “flow out" on the boundary.

3. Third, even if the quotient is a manifold without boundary for which we have a clear characterization, it may have a discrete structure that induces artifacts in the learned distribution. This is the case for Boyda et al. (2020): the flow construction over the quotient induces abnormalities in the density.

Motivated by the above drawbacks, we design a manifold continuous normalizing flow on the original manifold that maintains the requisite symmetry invariance. Since vanilla manifold CNFs do not maintain said symmetries, we instead construct equivariant

manifold flows and show they induce the desired invariance. To construct these flows, we present the first general way of designing equivariant vector fields on manifolds. A summary of our paper’s contributions is as follows:

• We present a general framework and the requisite theory for learning equivariant manifold flows: in our setup, the flows can be learned over arbitrary Riemannian manifolds while explicitly incorporating symmetries inherent to the problem. Moreover, we prove that the equivariant flows we construct can universally approximate distributions on closed manifolds.

• We demonstrate the efficacy of our approach by learning gauge invariant densities over in the context of quantum field theory. In particular, when applied to the densities in Boyda et al. (2020), we adhere more naturally to the target geometry and avoid the unnatural artifacts of the quotient construction.

• We highlight the benefit of incorporating symmetries into manifold flow models by comparing directly against previous general manifold density learning approaches. We show that when a general manifold learning model is not aware of symmetries inherent to the problem, the learned density is of considerably worse quality and violates said symmetries.

## 2 Related Work

Our work builds directly on pre-existing manifold normalizing flow models and enables them to leverage inherent symmetries through equivariance. In this section we cover important developments from the relevant fields: manifold normalizing flows and equivariant machine learning.

##### Normalizing Flows on Manifolds

Normalizing flows on Euclidean space have long been touted as powerful generative models (Dinh et al., 2016; Chen et al., 2018; Grathwohl et al., 2019). Similar to GANs Goodfellow et al. (2014) and VAEs Kingma and Welling (2013)

, normalizing flows learn to map samples from a tractable prior density to a target density. However, unlike the aforementioned models, normalizing flows account for changes in volume, enabling exact evaluation of the output probability density. In a rather concrete sense, this makes them theoretically principled. As such, they are ideal candidates for generalization beyond the Euclidean setting, where a careful, theoretically principled modelling approach is necessary.

Hence, many have attempted such a generalization, either through manifold-specific or general constructions. Rezende et al. (2020) introduced constructions specific to tori and spheres, while Bose et al. (2020) introduced constructions for hyperbolic space. Following this work, Lou et al. (2020); Mathieu and Nickel (2020); Falorsi and Forré (2020) concurrently introduced a general construction by extending Neural ODEs (Chen et al., 2018) to the setting of Riemannian manifolds. Our work takes inspiration from the methods of Lou et al. (2020); Mathieu and Nickel (2020) and generalizes them further to enable learning that takes into account symmetries of the target density.

##### Equivariant Machine Learning

Recent work has incorporated equivariance into various machine learning models (Cohen and Welling, 2016; Cohen et al., 2018, 2019; Kondor and Trivedi, 2018; Finzi et al., 2020; Rezende et al., 2019) to leverage symmetries inherent in data. Köhler et al. (2020), in particular, used equivariant normalizing flows to enable learning symmetric densities over Euclidean space. The authors note their approach is better suited to density learning in some physical chemistry settings (when compared to general purpose normalizing flows), since they take into account the symmetries of the problem.

Symmetries also appear naturally in the context of learning densities over manifolds. While in many cases symmetry can be a good inductive bias for learning333For example, asteroid impacts on the sphere can be modelled as being approximately invariant to rotation about the Earth’s axis., for certain test tasks it is a strict requirement. For example, Boyda et al. (2020) introduced equivariant flows for to model lattice gauge theories, where the modelled distribution must be conjugation invariant. However, beyond conjugation invariant learning on Boyda et al. (2020), not much other work has been done for learning invariant distributions over manifolds. Our work bridges this gap by introducing the first general equivariant manifold normalizing flow model for arbitrary manifolds and symmetries.

## 3 Background

In this section, we provide a terse overview of necessary concepts for understanding our paper. In particular, we address fundamental notions from Riemannian geometry as well as the basic set-up of normalizing flows on manifolds. For a more detailed introduction to Riemannian geometry, we refer the reader to textbooks such as Lee (2013) and Kobyzev et al. (2020).

### 3.1 Riemannian Geometry

A Riemannian manifold is an -dimensional manifold with a smooth collection of inner products for every tangent space . The Riemannian metric induces a distance on the manifold.

A diffeomorphism is a differentiable bijection with differentiable inverse. A diffeomorphism is called an isometry if for all tangent vectors where is the differential of . Note that isometries preserve the manifold distance function. The collection of all isometries forms a group , which we call the isometry group of the manifold .

Riemannian metrics also allow for a natural analogue of gradients on . For a function , we define the Riemannian gradient to be the vector on such that for .

### 3.2 Normalizing Flows on Manifolds

##### Manifold Normalizing Flow

Let be a Riemannian manifold. A normalizing flow on is a diffeomorphism (parametrized by ) that transforms a prior density to model density . The model distribution can be computed via the change of variables equation:

 ρfθ(x)=ρ(f−1θ(x))∣∣ ∣∣det∂f−1θ(x)dx∣∣ ∣∣=ρ(f−1θ(x))∣∣detJf−1θ(x)∣∣.
##### Manifold Continuous Normalizing Flow

A manifold continuous normalizing flow with base point is a function that satisfies the manifold ODE

 dγ(t)dt=X(γ(t),t) , γ(0)=z.

We define , to map any base point to the value of the CNF starting at , evaluated at time . This function is known as the (vector field) flow of .

### 3.3 Equivariance and Invariance

Let be an isometry subgroup of . We notate the action of an element on by the map .

Equivariant and Invariant Functions    We say that a function is equivariant if, for symmetries and , . We say a function is invariant if . When and are manifolds, the symmetries and are isometries.

Equivariant Vector Fields    Let , be a time-dependent vector field on manifold , with base point . is a -equivariant vector field if , .

Equivariant Flows    A flow on a manifold is -equivariant if it commutes with actions from , i.e. we have .

Invariance of Density    For a group , a density on a manifold is -invariant if, for all and , , where is the action of on .

## 4 Invariant Densities from Equivariant Flows

Our goal in this section is to describe a tractable way to learn a density over a manifold that obeys a symmetry given by an isometry subgroup . Since this cannot be done directly and it is not clear how a manifold continuous normalizing flow can be altered to preserve symmetry, we will derive the following implications to yield a tractable solution (note that we generalize previous work Köhler et al. (2020); Papamakarios et al. (2019) that has only addressed the case of Euclidean space):

1. -invariant potential -equivariant vector field (Theorem 1). We show that given a -invariant potential function , the vector field is -equivariant.

2. -equivariant vector field -equivariant flow (Theorem 2). We show that a -equivariant vector field on uniquely induces a -equivariant flow.

3. -equivariant flow -invariant density (Theorem 3). We show that given a -invariant prior and a -equivariant flow , the flow density is -invariant.

If we have a prior distribution on the manifold that obeys the requisite invariance, then the above implications show that we can use a -invariant potential to produce a flow that, in tandem with the CNF framework, learns an output density with the desired invariance. We claim that constructing a -invariant potential function on a manifold is far simpler than directly parameterizing a -invariant density or a -equivariant flow. We shall give explicit examples of -invariant potential constructions in Section 5.2 that induce a desired density invariance.

Moreover, we show in Theorem 4

that considering equivariant flows generated from invariant potential functions suffices to learn any smooth distribution over a closed manifold, as measured by Kullback-Leibler divergence.

We defer the proofs of all theorems to the appendix.

### 4.1 Equivariant Gradient of Potential Function

We start by showing how to construct -equivariant vector fields from -invariant potential functions.

To design an equivariant vector field , it is sufficient to set the vector field dynamics of as the gradient of some -invariant potential function . This is formalized in the following theorem.

###### Theorem 1.

Let be a Riemannian manifold and be its group of isometries (or an isometry subgroup). If is a smooth -invariant function, then the following diagram commutes for any :

M[r, "R_g"] [d, "Φ"]& M[d, "Φ"]

TM[r, "DR_g"] & TM

or . Hence is a -equivariant vector field. This condition is also tight in the sense that it only occurs if is the isometry subgroup.

Hence, as long as one can construct a -invariant potential function, one can obtain the desired equivariant vector field. By this construction, a parameterization of -invariant potential functions yields a parameterization of (some) -equivariant vector fields.

### 4.2 Constructing Equivariant Manifold Flows from Equivariant Vector Fields

To construct equivariant manifold flows, we will use tools from the theory of manifold ODEs. In particular, there exists a natural correspondence between equivariant flows and equivariant vector fields. We formalize this in the following theorem:

###### Theorem 2.

Let be a Riemannian manifold, and be its isometry group (or one of its subgroups). Let be any time-dependent vector field on , and be the flow of . Then is a -equivariant vector field if and only if is a -equivariant flow.

Hence we can obtain an equivariant flow from an equivariant vector field, and vice versa.

### 4.3 Invariant Manifold Densities from Equivariant Flows

We now show that -equivariant flows induce -invariant densities. Note that we require the group to be an isometry subgroup in order to control the density of , and the following theorem does not hold for general diffeomorphism subgroups.

###### Theorem 3.

Let be a Riemannian manifold, and be its isometry group (or one of its subgroups). If is a -invariant density on , and is a -equivariant diffeomorphism, then is also -invariant.

In the context of manifold normalizing flows, Theorem 3 implies that if the prior density on is -invariant and the flow is -equivariant, the resulting output density will be -invariant. In the context of the overall set-up, this reduces the problem of constructing a -invariant density to the problem of constructing a -invariant potential function.

### 4.4 Sufficiency of Flows Generated via Invariant Potentials

It is unclear whether equivariant flows induced by invariant potentials can learn arbitrary invariant distributions over manifolds. In particular, it is reasonable to have some concerns about limited expressivity, since it is unclear whether any equivariant flow can be generated in this way. We alleviate these concerns for our use cases by proving that equivariant flows obtained from invariant potential functions suffice to learn any smooth invariant distribution over a closed manifold, as measured by Kullback-Leibler (KL) divergence.

###### Theorem 4.

Let be a closed Riemannian manifold. Let be smooth -invariant distributions over said manifold, and let be the KL divergence between distributions and . If we choose a function such that for ,

 g(x)=log(π(x)ρ(x)).

Then we have:

 ∂∂tDKL(ρ||π)=−∫Mρexp(g)∥∇g∥2dx≤0.

In particular, note that if the target distribution is and the current distribution is , if we set to be and is the potential from which the flow is obtained, then the KL divergence between and is monotonically decreasing by Theorem 4. This means precisely that considering flows generated by invariant potential functions is sufficient to learn any smooth invariant target distribution on a closed manifold (as measured by KL divergence).

## 5 Learning Invariant Densities with Equivariant Flows

In this section, we discuss implementation details of the methodology given in Section 4. In particular, we describe the equivariant manifold flow model assuming an invariant potential is given, provide two examples of invariant potential constructions on different manifolds, and discuss how training is performed depending on the target task.

### 5.1 Equivariant Manifold Flow Model

We assume that a -invariant potential function is given. The equivariant flow model works by using automatic differentiation Paszke et al. (2017) on to obtain , using this for the vector field, and integrating in a step-wise fashion over the manifold. Specifically, forward integration and change-in-density (divergence) computations utilize the Riemannian Continuous Normalizing Flows (Mathieu and Nickel, 2020) framework. This flow model is used in tandem with a specific training procedure (described in Section 5.3) to obtain a -invariant model density that approximates some target.

### 5.2 Constructing G-invariant Potential Functions

In this subsection, we present two constructions of invariant potentials on manifolds. Note that a symmetry of a manifold (i.e. action by an isometry subgroup) will leave part of the manifold free. The core idea of our invariant potential construction is to parameterize a neural network on the free portion of the manifold. While the two constructions we give below are certainly not exhaustive, they illustrate the versatility of our method, which is applicable to general manifolds and symmetries.

#### 5.2.1 Isotropy Invariance on S2

Consider the sphere , which is the Riemannian manifold with the induced pullback metric. The isotropy group for a point is defined as the subgroup of the isometry group which fixes , i.e. the set of rotations around an axis that passes through . In practice, we let , so the isotropy group is the group of rotations on the -plane. An isotropy invariant density would be invariant to such rotations, and hence would look like a horizontally-striped density on the sphere (see Figure 3(a)).

##### Invariant Potential Parameterization

We design an invariant potential by applying a neural network to the free parameter. In the case of our specific isotropy group listed above, the free parameter is the -coordinate. The invariant potential is simply a -input neural network with the spatial input being the

-coordinate and the time input being the time during integration. As a result of this design, we see that the only variance in the learned distribution that uses this potential will be along the

-axis, as desired.

##### Prior Distributions

For proper learning with a normalizing flow, we need a prior distribution on the sphere that respects the isotropy invariance. There are many isotropy invariant potentials on the sphere. Natural choices include the uniform density (which is invariant to all rotations) and the wrapped distribution with the center at (Skopek et al., 2019; Nagano et al., 2019). For our experiments, we use the uniform density.

#### 5.2.2 Conjugation Invariance on SU(n)

For many applications in physics (specifically gauge theory and lattice quantum field theory), one works with the Lie Group — the group of unitary matrices with determinant

. In particular, when modelling probability distributions on

for lattice QFT, the desired distribution must be invariant under conjugation by (Boyda et al., 2020). Conjugation is an isometry on (see Appendix A.5), so we can model probability distributions invariant under this action with our developed theory.

##### Invariant Potential Parameterization

We want to construct a conjugation invariant potential function . Note that matrix conjugation preserves eigenvalues. Thus, for a function to be invariant to matrix conjugation, it has to act on the eigenvalues of as a multi-set.

We can parameterize such potential functions by the DeepSet network from Zaheer et al. (2017). DeepSet is a permutation invariant neural network that acts on the eigenvalues, so the mapping of is for some set function . We append the integration time to the input of the standard neural network layers in the DeepSet network.

As a result of this design, we see that the only variance in the learned distribution will be amongst non-similar matrices, while all similar matrices will be assigned the same density value.

##### Prior Distributions

For the prior distribution of the flow, we need a distribution that respects the matrix conjugation invariance. We use the Haar measure on , whose volume element is given for an as (Boyda et al., 2020). We can sample from and compute the log probabilities with respect to this distribution efficiently with standard matrix computations (Mezzadri, 2007).

### 5.3 Training Paradigms for Equivariant Manifold Flows

There are two notable ways in which we can use the model described in Section 5.1. Namely, we can use it to learn to sample from a distribution for which we have a density function, or we can use it to learn the density given a way to sample from the distribution. These training paradigms are useful in different contexts, as we will see in Section 6.

##### Learning to sample given an exact density.

In certain settings, we are given an exact density and the task is to learn a tractable sampler for the distribution. For example in Boyda et al. (2020), we are given conjugation-invariant densities on for which we know the exact density function (without knowledge of any normalizing constants). In contrast to procedures for normalizing flow training that use negative log-likelihood based losses, we do not have access to samples from the target distribution. Instead, we train our models by sampling from the Haar distribution on

, computing the KL divergence between the probabilities that our model assigns to these samples and the probabilities of the target distribution evaluated at these samples, and backpropagating from this KL divergence loss. When this loss is minimized, we can sample from the target distribution by sampling the prior, then forwarding the prior samples through our model. In the context of

Boyda et al. (2020), such a flow-based sampler is important for modelling gauge theories.

##### Learning the density given a sampler.

In other settings, we are given a way to sample from a target distribution and want to learn the precise density for downstream tasks. For this setting, we sample the target distribution, use our flow to map it to a tractable prior, and use a negative log-likelihood-based loss. The flow will eventually learn to assign higher probabilities in sampled regions, and in doing so, will learn to approximate the target density.

## 6 Experiments

In this section, we utilize instantiations of equivariant manifold flows to learn densities over various manifolds of interest that are invariant to certain symmetries. First, we construct flows on that are invariant to conjugation by , which are important for constructing flow-based samplers for lattice gauge theories in theoretical physics Boyda et al. (2020). In this setting, our model outperforms the construction of Boyda et al. (2020).

As a second application, we model asteroid impacts on Earth by constructing flow models on that are invariant to the isotropy group that fixes the north pole. Our approach is able to overcome dataset bias, as only land impacts are reported in the dataset.

Finally, to demonstrate the need for enforcing equivariance of flow models, we directly compare our flow construction with a general purpose flow while learning a density with an inherent symmetry. The densities we decided to use for this purpose are sphere densities that are invariant to action by the isotropy group. Our model is able to learn these densities much better than previous manifold ODE models that do not enforce equivariance of flows (Lou et al., 2020), thus showing the ability of our model to leverage the desired symmetries. In fact, even on simple isotropy-invariant densities, our model succeeds while the free model without equivariance fails.

### 6.1 SU(n) Gauge Equivariant Neural Network Flows

Learning gauge equivariant neural network flows is important for obtaining good flow-based samplers of densities on useful for lattice quantum field theory (Boyda et al., 2020). We compare our model for gauge equivariant flows (Section 5.2.2) with that of Boyda et al. (2020). For the sake of staying true to the application area, we follow the framework of Boyda et al. (2020) in learning densities on that are invariant to conjugation by . In particular, our goal is to learn a flow to model a target distribution so that we may efficiently sample from it.

As mentioned above in Section 5.3, this setting follows the first paradigm in which we are given exact density functions and learn how to sample.

For the actual architecture of our equivariant manifold flows, we parameterize our potentials as DeepSet networks on eigenvalues as detailed in Section 5.2.2

. The prior distribution for our model is also the Haar (uniform) distribution on

. Further training details are given in Appendix C.1.

#### 6.1.1 Su(2)

Figure 2 displays learned densities for our model and the model of Boyda et al. (2020) in the case of three particular densities on described in Appendix C.2.1. While both models match the target distributions well in high-density regions, we find that our model exhibits a considerable improvement in lower-density regions, where the tails of our learned distribution decay faster. By contrast, the model of Boyda et al. (2020) seems to be unable to reduce mass near , a possible consequence of their construction. Even in high-density regions, our model appears to vary smoothly, with fewer unnecessary bumps and curves when compared to the densities of the model in Boyda et al. (2020).

#### 6.1.2 Su(3)

Figure 2 displays learned densities for our model and the model of Boyda et al. (2020) in the case of three particular densities on described in Appendix C.2.2. In this case, we see that our models fit the target densities more accurately and better respect the geometry of the target distribution. Indeed, while the learned densities of Boyda et al. (2020) are often sharp and have pointed corners, our models learn densities that vary smoothly and curve in ways that are representative of the target distributions.

### 6.2 Asteroid Impact Dataset Bias Correction

We also showcase our model’s ability to correct for dataset bias. In particular, we consider the test case of modelling asteroid impacts on Earth. Towards this end, many preexisting works have compiled locations of previous asteroid impacts Meteorite Landings (2017, 2011), but modelling these datasets is challenging since they are inherently biased. In particular, all recorded impacts are found on land. However, ocean impacts are also dangerous Ward and Asphaug (2003) and should be properly modelled. To correct for this bias, we note that the distribution of asteroid impacts should be invariant with respect to the rotation of the Earth. We apply our isotropy invariant flow (described in Section 5.2.1) to model the asteroid impact locations given by the dataset Meteorite Landings (2017) 444This dataset was released by NASA without a specified license. Training happens in the setting of the second paradigm described in Section 5.3, since we can easily sample the target distribution and aim to learn the density. We visualize our results in Figure 3.

### 6.3 Modelling Invariance Matters

We also show that our equivariant condition on the manifold flow matters for efficient and accurate training when the target distribution is invariant. In particular, we again consider the sphere under the action of the isotropy group. We try to learn the isotropy invariant density given in Figure 3(a) and compare the results of our equivariant flow against those of a predefined manifold flow that does not explicitly model the symmetry (Lou et al., 2020)

. We train for 100 epochs with a learning rate of

and a batch size of ; our results are shown in Figure 4.

Despite our equivariant flow having fewer parameters (as both flows have the same width and the equivariant flow has an input dimension of ), our model is able to capture the distribution much better than the base manifold flow. This is due to the inductive bias of our equivariant model which explicitly leverages the underlying symmetry.

## 7 Conclusion

In this work, we introduce equivariant manifold flows in a fully general context and provide the necessary theory to ensure our construction is principled. We also demonstrate the efficacy of our approach in the context of learning conjugation invariant densities over and , which is an important task for sampling lattice gauge theories in quantum field theory. In particular, we show that our method can more naturally adhere to the geometry of the target densities when compared to prior work while being more generally applicable. We also present an application to modelling asteroid impacts and demonstrate the necessity of modelling existing invariances by comparing against a regular manifold flow.

Further considerations. While our theory and implementations have utility in very general settings, there are still some limitations that could be addressed in future work. Further research may focus on finding other ways to generate equivariant manifold flows that do not rely on the construction of an invariant potential, and perhaps additionally on showing that such methods are sufficiently expressive to learn over open manifolds. Our models also require a fair bit of tuning to achieve results as strong as we demonstrate. Finally, we note that our theory and learning algorithm are too abstract for us to be sure of the future societal impacts. Still, we advance the field of deep generative models, which is known to have potential for negative impacts through malicious generation of fake images and text. Nevertheless, we do not expect this work to have negative effects in this area, as our applications are not in this domain.

## Acknowledgements

We would like to thank Facebook AI for funding equipment that made this work possible; in addition, we thank the National Science Foundation for helping fund this research effort (NSF IIS-2008102). We would also like to acknowledge Jonas Köhler and Denis Boyda for their useful insights.

## References

• J. Bose, A. Smofsky, R. Liao, P. Panangaden, and W. Hamilton (2020) Latent variable modelling with hyperbolic normalizing flows. In Proceedings of the 37th International Conference on Machine Learning, pp. 1045–1055. Cited by: §2.
• D. Boyda, G. Kanwar, S. Racanière, D. J. Rezende, M. S. Albergo, K. Cranmer, D. C. Hackett, and P. E. Shanahan (2020) Sampling using gauge equivariant flows. arXiv preprint arXiv:2008.05456. Cited by: §B.2.1, §B.2.1, §C.1, §C.2, §C.2, item 3, 2nd item, §1, §1, §2, §5.2.2, §5.2.2, §5.3, Figure 2, §6.1.1, §6.1.2, §6.1, §6.
• D. Bump (2004) Lie groups. Springer. Cited by: §B.1.
• R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018) . In Advances in Neural Information Processing Systems, Vol. 31, pp. 6571–6583. Cited by: §1, §1, §2, §2.
• T. S. Cohen, M. Geiger, J. Köhler, and M. Welling (2018) Spherical CNNs. In International Conference on Learning Representations, Cited by: §2.
• T. Cohen, M. Weiler, B. Kicanaoglu, and M. Welling (2019) Gauge equivariant convolutional networks and the icosahedral CNN. In Proceedings of the 36th International Conference on Machine Learning, pp. 1321–1330. Cited by: §2.
• T. Cohen and M. Welling (2016) Group equivariant convolutional networks. In Proceedings of The 33rd International Conference on Machine Learning, pp. 2990–2999. Cited by: §2.
• L. Dinh, J. Sohl-Dickstein, and S. Bengio (2016) Density estimation using real nvp. arXiv preprint arXiv:1605.08803. Cited by: §2.
• C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios (2019) Neural spline flows. In NeurIPS, Cited by: Appendix D.
• Earth Impact Database (2011) Earth impact database. Note: Retrieved from http://passc.net/EarthImpactDatabase Cited by: §6.2.
• L. Falorsi and P. Forré (2020) Neural ordinary differential equations on manifolds. arXiv preprint arXiv:2006.06663. Cited by: §2.
• W. Feiten, M. Lang, and S. Hirche (2013) Rigid motion estimation using mixtures of projected gaussians. Proceedings of the 16th International Conference on Information Fusion, pp. 1465–1472. Cited by: §1.
• M. Field (1980) Equivariant dynamical systems. Transactions of the American Mathematical Society 259 (1), pp. 185–205. Cited by: Appendix A.
• M. Finzi, S. Stanton, P. Izmailov, and A. G. Wilson (2020)

Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data

.
In International Conference on Machine Learning, pp. 3165–3176. Cited by: §2.
• J. Gallier and J. Quaintance (2020) Differential geometry and lie groups: a computational perspective. Vol. 12, Springer. Cited by: §A.5, Appendix B.
• I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, pp. 2672–2680. Cited by: §1, §2.
• W. Grathwohl, R. T. Q. Chen, J. Bettencourt, and D. Duvenaud (2019) Scalable reversible generative models with free-form continuous dynamics. In International Conference on Learning Representations, Cited by: §1, §1, §2.
• T. Hamelryck, J. T. Kent, and A. Krogh (2006) Sampling realistic protein conformations using local structural bias. PLoS Computational Biology 2 (9). Cited by: §1.
• D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §2.
• D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In International Conference on Learning Representations, Cited by: §C.1.
• I. Kobyzev, S. Prince, and M. Brubaker (2020) Normalizing flows: an introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §3.
• J. Köhler, L. Klein, and F. Noe (2020) Equivariant flows: exact likelihood generative learning for symmetric densities. In Proceedings of the 37th International Conference on Machine Learning, pp. 5361–5370. Cited by: §1, §2, §4.
• R. Kondor and S. Trivedi (2018) On the generalization of equivariance and convolution in neural networks to the action of compact groups. In Proceedings of the 35th International Conference on Machine Learning, pp. 2747–2755. Cited by: §2.
• J. M. Lee (2013) Introduction to smooth manifolds. Graduate Texts in Mathematics, Springer New York. Cited by: §A.2, item 1, §3.
• A. Lou, D. Lim, I. Katsman, L. Huang, Q. Jiang, S. N. Lim, and C. M. De Sa (2020) Neural manifold ordinary differential equations. In Advances in Neural Information Processing Systems, Vol. 33, pp. 17548–17558. Cited by: §B.2.2, §1, §1, §2, 3(c), §6.3, §6.
• E. Mathieu and M. Nickel (2020) Riemannian continuous normalizing flows. In Advances in Neural Information Processing Systems, Vol. 33, pp. 2503–2515. Cited by: §1, §2, §5.1.
• Meteorite Landings (2017) Meteorite landings dataset. Note: Retrieved from https://data.world/nasa/meteorite-landings Cited by: Figure 3, §6.2.
• F. Mezzadri (2007) How to generate random matrices from the classical compact groups. Notices of the American Mathematical Society 54, pp. 592–604. Cited by: §B.1, §5.2.2.
• Y. Nagano, S. Yamaguchi, Y. Fujita, and M. Koyama (2019)

A wrapped normal distribution on hyperbolic space for gradient-based learning

.
In Proceedings of the 36th International Conference on Machine Learning, pp. 4693–4702. Cited by: §5.2.1.
• G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan (2019) Normalizing flows for probabilistic modeling and inference. arXiv preprint arXiv:1912.02762. Cited by: §C.1, §4.
• A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017)

Automatic differentiation in pytorch

.
In Neural Information Processing System Autodiff Workshop, Cited by: §5.1.
• A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019)

Pytorch: an imperative style, high-performance deep learning library

.
arXiv preprint arXiv:1912.01703. Cited by: §C.1.
• D. J. Rezende, G. Papamakarios, S. Racaniere, M. Albergo, G. Kanwar, P. Shanahan, and K. Cranmer (2020) Normalizing flows on tori and spheres. In Proceedings of the 37th International Conference on Machine Learning, pp. 8083–8092. Cited by: §2.
• D. J. Rezende, S. Racanière, I. Higgins, and P. Toth (2019) Equivariant hamiltonian flows. arXiv preprint arXiv:1909.13739. Cited by: §2.
• O. Skopek, O. Ganea, and G. B’ecigneul (2019)

Mixed-curvature variational autoencoders

.
arXiv preprint arXiv:1911.08411. Cited by: §5.2.1.
• S. N. Ward and E. Asphaug (2003) Asteroid impact tsunami of 2880 March 16. Geophysical Journal International 153 (3), pp. F6–F10. External Links: Cited by: §6.2.
• A. G. Wasserman (1969) Equivariant differential topology. Topology 8 (2), pp. 127–150. Cited by: Appendix A.
• P. Wirnsberger, A. J. Ballard, G. Papamakarios, S. Abercrombie, S. Racanière, A. Pritzel, D. Jimenez Rezende, and C. Blundell (2020) Targeted free energy estimation via learned mappings. The Journal of Chemical Physics 153 (14), pp. 144112. Cited by: §1.
• M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola (2017) Deep sets. In Advances in Neural Information Processing Systems, Vol. 30, pp. 3391–3401. Cited by: §C.1, §5.2.2.

## Appendix A Proof of Theorems

In this section, we restate and prove the theorems in Section 4. These give the theoretical foundations that we use to build our models. Prior work [Wasserman, 1969, Field, 1980] addresses some of the results we formalize below.

### a.1 Proof of Theorem 1

###### Theorem 1.

Let be a Riemannian manifold and be its group of isometries (or an isometry subgroup). If is a smooth -invariant function, then the following diagram commutes for any :

M[r, "R_g"] [d, "Φ"]& M[d, "Φ"]

TM[r, "DR_g"] & TM

or . This is condition is also tight in the sense that it only occurs if is the group of isometries.

###### Proof.

We first recall the Riemannian gradient chain rule:

 ∇u(Φ∘Rg)=(DuRg)⊤(∇RguΦ)

where is the “adjoint" given by

 h(DuRg(v),w)=h(v,(DuRg)⊤(w)).

Since is an isometry, we also have

 h(x,y)=h(DuRg(x),DuRg(y)).

Combining the above two equations gives

 h(x,y)=h(DuRg(x),DuRg(y))=h(x,(DuRg)⊤(DuRg(y))),

which implies for all ,

 h(x,y−(DuRg)⊤(DuRg(y)))=0.

Since is a Riemannian metric (even pseudo-metric works due to non-degeneracy), we must have that .

To complete the proof, we recall that , and this combined with chain rule gives

 ∇uΦ=∇u(Φ∘Rg)=(DuRg)⊤(∇RguΦ).

Now applying on both sides gives

 ∇RguΦ=DuRg∇uΦ

which is exactly what we want to show.

We see that this is an “only if" condition because we must necessarily get that the adjoint is the inverse, which implies that is an isometry. ∎

### a.2 Proof of Theorem 2

###### Theorem 2.

Let be a Riemannian manifold, and be its isometry group (or one of its subgroups). Let be any time-dependent vector field on , and be the flow of . Then is an -equivariant vector field if and only if is a -equivariant flow for any .

###### Proof.

-equivariant -equivariant . We invoke the following lemma from Lee [2013, Corollary 9.14]:

###### Lemma 1.

Let be a diffeomorphism. If is a smooth vector field over and is the flow of X, then the flow of ( is another notation for the differential of ) is , with domain for each .

Examine and its action on . Since is -equivariant, we have for any ,

 ((Rg)∗X)(x,t)=(DR−1g(x)Rg)X(R−1g(x),t)=X(Rg∘R−1g(x),t)=X(x,t)

so it follows that . Applying the lemma above, we get:

 F(Rg)∗X,T=Rg∘FX,T∘R−1g

and, by simplifying, we get that , as desired.

-equivariant -equivariant . This direction follows from the chain rule. If is -equivariant, then at all times we have:

 (DmRg)(X(FX,t(m),t) =(DmRg)(ddtFX,T(m)) (definition) =ddt(Rg∘FX,T)(m) (chain rule) =ddtFX,T(Rgm) (equivariance) =X(Rg(FX,t(m)),t) (definition)

This concludes the proof of the backward direction. ∎

### a.3 Proof of Theorem 3

###### Theorem 3.

Let be a Riemannian manifold, and be its isometry group (or one of its subgroups). If is a -invariant density on , and is a -equivariant diffeomorphism, then is also -invariant.

###### Proof.

We wish to show is also -invariant, i.e. for all .

We first recall the definition of :

 ρf(x)=ρ(f−1(x))∣∣∣det∂f−1(x)dx∣∣∣=ρ(f−1(x))∣∣detJf−1(x)∣∣.

Since is -equivariant, we have for any . Also, since is -invariant, we have . Combining these properties, we see that:

 ρf(Rgx) =ρf(Rgx)|detJRg(x)||detJRg(x)|=ρRg−1∘f(x)|detJRg(x)| (expanding definition of ρf) =ρf∘Rg−1(x)|detJRg(x)|=ρ((Rg∘f−1)(x))|detJRg∘f−1(x)||detJRg(x)| (G-equivariance of f) =(ρ∘Rg∘f−1)(x)|detJRg(f−1(x))Jf−1(x)||detJRg(x)| (expanding Jacobian) =(ρ∘f−1)(x)|detJRg(f−1(x))||detJf−1(x)||detJRg(x)| (G-invariance of ρ) =ρ(f−1(x))|detJf−1(x)|⋅|detJRg(f−1(x))||detJRg(x)| (rearrangement) =ρf(x)⋅|detJRg(f−1(x))||detJRg(x)| (expanding definition of ρf)

Now note that is contained in the isometry group, and thus is an isometry. This means for any , so the right-hand side above is simply , which proves the theorem. ∎

### a.4 Proof of Theorem 4

###### Theorem 4.

Let be a closed Riemannian manifold. Let be a distribution over said manifold, and let be the Kullback–Leibler divergence between distributions and . If we choose a such that:

 g(x)=log(π(x)ρ(x))

for , we have:

 ∂∂tDKL(ρ||π)=−∫ρexp(g)||∇g||2dx
###### Proof.

We start by noting the following by the Fokker-Planck equation:

 ∂ρ∂t=∇⋅(ρ∇g).

This gives:

where the final equality follows from the divergence theorem, since the integral of the divergence over a closed manifold is . Now if we choose such that:

 g(x)=log(π(x)ρ(x)).

Then we have:

 ∂∂tDKL(ρ||π)=−∫(ρ∇g)⋅∇exp(g)dx=−∫ρexp(g)||∇g||2dx,

as desired.

### a.5 Conjugation by SU(n) is an Isometry

We now prove a lemma that shows that the group action of conjugation by is an isometry subgroup. This implies that Theorems 1 through 3 above can be specialized to the setting of .

###### Lemma 2.

Let be the group action of conjugation by , and let each represent the corresponding action of conjugation by . Then is an isometry subgroup.

###### Proof.

We first show that the matrix conjugation action of is unitary. For , note that the action of conjugation is given by . We have that is unitary because:

 (R−T⊗R)∗(R−T⊗R) =(¯¯¯¯¯¯¯¯¯R−1⊗R∗)(R−T⊗R) (conjugate transposes distribute over ⊗) =(¯¯¯¯¯¯¯¯¯R−1R−T)⊗(R∗R) (mixed-product property of ⊗) =(RTR−T)⊗(I)=(I)⊗(I)=In2×n2 (simplification)

Now choose an orthonormal frame of . Note that locally consists of

shifts of the algebra, which itself consists of traceless skew-Hermitian matrices

Gallier and Quaintance [2020]. We show is an isometry subgroup by noting that when it acts on the frame, the resulting frame is orthonormal. Let , and consider the result of action of on the frame, namely . Then we have:

 (RgXi)∗(RgXj)=X∗iR∗gRgXj=X∗iXj.

Note for , we have and for we see . Hence the resulting frame is orthonormal and is an isometry subgroup. ∎

## Appendix B Manifold Details for the Special Unitary Group SU(n)

In this section, we give a basic introduction to the special unitary group and relevant properties.

Definition. The special unitary group consists of all -by- unitary matrices (i.e. for the conjugate transpose of ) that have determinant .

Note that is a smooth manifold; in particular, it has Lie structure Gallier and Quaintance [2020]. Moreover, the tangent space at the identity (i.e. the Lie algebra) consists of traceless skew-Hermitian matrices Gallier and Quaintance [2020]. The Riemannian metric is .

### b.1 Haar Measure on SU(n)

Haar Measure. Haar measures are generic constructs of measures on topological groups that are invariant under group operation. For example, the Lie group has Haar measure , which is defined as the unique measure such that for any , we have

 μH(VU)=μH(UW)=μH(U)

for all and .

A topological group together with its unique Haar measure defines a probability space on the group. This gives one natural way of defining probability distributions on the group, explaining its importance in our construction of probability distributions on Lie groups, specifically .

To make the above Haar measure definition more concrete, we note from Bump [2004, Proposition 18.4] that we can transform an integral over with respect to the Haar measure into integrating over the corresponding diagonal matrices under eigendecomposition:

 ∫SU(n)fdμH=1n!∫Tf(diag(λ1,…,λn))∏i

Thus, we can think of the Haar measure as inducing the change of variables with volume element

 Haar(x)=∏i

To sample uniformly from the Haar measure, we just need to ensure that we are sampling each with probability proportional to .

Sampling from the Haar Prior. We use Algorithm 1 [Mezzadri, 2007] for generating a sample uniformly from the Haar prior on :

### b.2 Eigendecomposition on SU(n)

One main step in the invariant potential computation for is to derive formulas for the eigendecomposition of as well as formulas for differentiation through the eigendecomposition (recall that we must differentiate the -invariant potential to get -equivariant vector field , as described in Section 5.2.2). This section first derives general formulas for how to do this for . In practice, such general methods often introduce instability, and thus, for the oft-used special cases of , we derive explicit formulas for the eigenvalues based on finding roots of the characteristic polynomials (given by root formulas for quadratic/cubic equations).

#### b.2.1 Derivations for the General Case SU(n)

Here we reconstruct the steps of differentiation through eigendecomposition from Boyda et al. [2020, Appendix C] that allow efficient computation in our use-case. For our matrix-conjugation-invariant flow, we need only differentiate the eigenvalues with respect to the input .

For an input , let its eigendecomposition be , where contains its eigenvalues, and with

as its eigenvectors. Let

denote our loss function, and write the downstream gradients in row vector format:

 g=[∂L∂Rew∂L∂Imw]=[g(1)g(2)].

Then following similar steps as in Boyda et al. [2020], we can compute the gradient of with respect to the real and imaginary parts of as follows:

 ∂L∂ReU=n∑i=1g(1)iRe(¯¯¯¯pip⊤i)+n∑i=1g(2)iIm(¯¯¯¯pip⊤i)
 ∂L∂ImU=−n∑i=1g(1)iIm(¯¯¯¯pip⊤i)+n∑i=1g(2)iRe(¯¯¯¯pip⊤i)

If we define

 Q(1)=[g(1)1¯¯¯¯¯p1…g(1)n¯pn]Q(2)=[g(2)1¯¯¯¯¯p1…g(2)n¯¯¯¯¯pn]

Then we can write the gradients in terms of efficient matrix computations:

 ∂L∂ImU=−Im(Q(1)P⊤)+Re(Q(2)P⊤).

#### b.2.2 Explicit Formula for Su(2)

We now derive an explicit eigenvalue formula for the case. Let us denote for such that as an element of ; then the characteristic polynomial of this matrix is given by

 det(λI−U)=(λ−(a+bi))(λ−(a−bi))+(c+di)(c−di)=(a−λ)2+b2+c2+d2=λ2−2aλ+1

and thus its eigenvalues are given by

 λ1=a+i√1−a2=a+i√b2+c2+d2
 λ2=a−i√1−a2=a−i√b2+c2+d2

Remark. We note that there is a natural isomorphism , given by

 ϕ(a,b,c,d)=[a+bi−c+dic+dia−bi]

We can exploit this isomorphism by learning a flow over with a regular manifold flow like NMODE Lou et al. [2020] and mapping it to a flow over . This is also an acceptable way to obtain stable density learning over .

#### b.2.3 Explicit Formula for Su(3)

We now derive an explicit eigenvalue formula for the case. For the case of , we can compute the characteristic polynomial as

 det(λI−U) =det⎛⎜⎝⎡⎢⎣λ−U11−U12−U13−U21λ−U22−U23−U31−U32λ−U33⎤⎥⎦⎞⎟⎠ =λ3+c2λ2+c1λ+c0

where

 c2=−(U11+U22+U33)
 c1=U11U22+U22U33+U33U11−U12U21−U23U32−U13U31
 c0=−(U12U23U31+U13U21U32+U11U22U33−U12U21U33−U13U31U22−U23U32U11)

Now to solve the equation

 λ3+c2λ2+c1λ+c0=0

we first transform it into a depressed cubic

 t3+pt+q=0

where we make the transformation

 t=x+c23
 p=3c1−c223
 q=2c32−9c2c1+27c027

Now from Cardano’s formula, we have the cubic roots of the depressed cubic given by

 λ1,2,3=3 ⎷−q2+√q24+p327+3 ⎷−q2−√q24+p327

where the two cubic roots in the above equation are picked such that they multiply to .

## Appendix C Experimental Details for Learning Equivariant Flows on SU(n)

This section presents some additional details regarding the experiments that learn invariant densities on in Section 6.

### c.1 Training Details

Our DeepSet network [Zaheer et al., 2017] consists of a feature extractor and regressor. The feature extractor is a -layer tanh network with hidden channels. We concatenate the time component to the sum component of the feature extractor before feeding the resulting

size tensor into a

-layer tanh regressor network.

To train our flows, we minimize the KL divergence between our model distribution and the target distribution [Papamakarios et al., 2019], as is done in Boyda et al. [2020]. In a training iteration, we draw a batch of samples uniformly from , map them through our flow, and compute the gradients with respect to the batch KL divergence between our model probabilities and the target density probabilities. We use the Adam stochastic optimizer for gradient-based optimization [Kingma and Ba, 2015]. The graph shown in Figure 2 was trained for iterations with a batch size of and weight decay setting of ; the starting learning rate for Adam was , and a multi-step learning rate schedule that decreased the learning rate by a factor of every epochs was used. We use PyTorch to implement our models and run experiments Paszke et al. [2019]. Experiments are run on one CPU and/or GPU at a time, where we use one NVIDIA RTX 2080Ti GPU with 11 GB of GPU RAM.

### c.2 Conjugation-Invariant Target Distributions

Boyda et al. [2020] defined a family of matrix-conjugation-invariant densities on as:

 ptoy(U)=1ZeβnRetr(∑kckUk),

which is parameterized by scalars and . The normalizing constant is chosen to ensure that is a valid probability density with respect to the Haar measure.

More specifically, the experiments of Boyda et al. [2020] focus on learning to sample from the distribution with the above density with three components, in the following form:

 ptoy(U)=1ZeβnRetr(c1U+c2U2+c3U