Formulating observed data points as the outcomes of probabilistic processes, has proven to provide a useful framework to design successful machine learning models. Thus far, the research community has drawn almost exclusively from results in probability theory limited to Euclidean and discrete space. Yet, expanding the set of possible spaces under consideration to those with a non-trivial topology has had a longstanding tradition and giant impact on such fields as physics, mathematics, and various engineering disciplines. One significant class describing numerous spaces of fundamental interest is that ofLie groups, which are groups of symmetry transformations that are simultaneously differentiable manifolds. Lie groups include rotations, translations, scaling, and other geometric transformations, which play an important role in several application domains. Lie group elements are for example utilized to describe the rigid body rotations and movements central in robotics, and form a key ingredient in the formulation of the Standard Model of particle physics. They also provide the building blocks underlying ideas in a plethora of mathematical branches such as holonomy in Riemannian geometry, root systems in Combinatorics, and the Langlands Program connecting geometry and number theory.
Many of the most notable recent results in machine learning can be attributed to researchers’ ability to combine probability theoretical concepts with the power of deep learning architectures, e.g. by devising optimization strategies able to directly optimize the parameters of probability distributions from samples through backpropagation. Perhaps the most successful instantiation of this combination, has come in the framework ofVariational Inference (VI) (Jordan et al., 1999), a Bayesian method used to approximate intractable probability densities through optimization. Crucial for VI is the ability to posit a flexible family of densities, and a way to find the member closest to the true posterior by optimizing the parameters.
These variational parameters are typically optimized using the evidence lower bound (ELBO), a lower bound on the data likelihood. The two main approaches to obtaining estimates of the gradients of the ELBO are the score function (Paisley et al., 2012; Mnih and Gregor, 2014) also known as REINFORCE (Williams, 1992), and the reparameterization trick (Price, 1958; Bonnet, 1964; Salimans et al., 2013; Kingma and Welling, 2013; Rezende et al., 2014). While various works have shown the latter to provide lower variance estimates, its use is limited by the absence of a general formulation for all variational families. Although in recent years much work has been done in extending this class of reparameterizable families (Ruiz et al., 2016; Naesseth et al., 2017; Figurnov et al., 2018), none of these methods explicitly investigate the case of distributions defined on non-trivial manifolds such as Lie groups.
The principal contribution of this paper is therefore to extend the reparameterization trick to Lie groups. We achieve this by providing a general framework to define reparameterizable densities on Lie groups, under which the well-known Gaussian case of Kingma and Welling (2013) is recovered as a special instantiation. This is done by pushing samples from the Lie algebra into the Lie group using the exponential map, and by observing that the corresponding density change can be analytically computed. We formally describe our approach using results from differential geometry and measure theory.
In the remainder of this work we first cover some preliminary concepts on Lie groups and the reparameterization trick. We then proceed to present the general idea underlying our reparameterization trick for Lie groups (ReLie111Pronounced ‘really’.), followed by a formal proof. Additionally, we provide an implementation section222Code available at https://github.com/pimdh/relie where we study three important examples of Lie groups, deriving the reparameterization details for the -Torus, , the oriented group of 3D rotations, , and the group of 3D rotations and translations, . We conclude by creating complex and multimodal reparameterizable densities on using a novel non-invertible normalizing flow, demonstrating applications of our work in both a supervised and unsupervised setting.
In this section we first cover a number of preliminary concepts that will be used in the rest of this paper.
2.1 Lie Groups and Lie Algebras
Lie Group, :
A Lie group, is a group that is also a smooth manifold. This means that we can, at least in local regions, describe group elements continuously with parameters. The number of parameters equals the dimension of the group. We can see (connected) Lie groups as continuous symmetries where we can continuously traverse between group elements333We refer the interested reader to (Hall, 2003).. Many relevant Lie groups are matrix Lie groups, which can be expressed as a subgroup of the Lie group of invertible square matrices with matrix multiplication as product.
Lie Algebra, :
The Lie algebra of a
dimensional Lie group is its tangent space at the identity, which is a vector space ofdimensions. We can see the algebra elements as infinitesimal generators, from which all other elements in the group can be created. For matrix Lie groups we can represent vectors in the tangent space as matrices .
Exponential Map, :
The structure of the algebra creates a map from an element of the algebra to a vector field on the group manifold. This gives rise to the exponential map, which maps an algebra element to the group element at unit length from the identity along the flow of the vector field. The zero vector is thus mapped to the identity. For compact connected Lie groups, such as , the exponential map is surjective. Often, the map is not injective, so the inverse, the map, is multi-valued. The exponential map of matrix Lie groups is the matrix exponential.
Adjoint Representation, :
The Lie algebra is equipped with with a bracket , which is bilinear. The bracket relates the structure of the group to structure on the algebra. For example, can be expressed in terms of the bracket. The bracket of matrix Lie groups is the commutator of the algebra matrices. The adjoint representation of is the matrix representation of the linear map .
2.2 Reparameterization Trick
The reparameterization trick (Price, 1958; Bonnet, 1964; Salimans et al., 2013; Kingma and Welling, 2013; Rezende et al., 2014) is a technique to simulate samples as , where is independent from 444At most weakly dependent (Ruiz et al., 2016)., and the transformation should be differentiable w.r.t. . It has been shown that this generally results in lower variance estimates than score function variants, thus leading to more efficient and better convergence results (Titsias and Lázaro-Gredilla, 2014; Fan et al., 2015). This reparameterization of samples , allows expectations w.r.t. to be rewritten as , thus making it possible to directly optimize the parameters of a probability distribution through backpropagation.
Unfortunately, there exists no general approach to defining a reparameterization scheme for arbitrary distributions. Although there has been a significant amount of research into finding ways to extend or generalize the reparameterization trick (Ruiz et al., 2016; Naesseth et al., 2017; Figurnov et al., 2018), to the best of our knowledge no such trick exists for spaces with non-trivial topologies such as Lie groups.
3 Reparameterizing Distributions on Lie Groups
In this section we will first explain our reparameterization trick for distributions on Lie groups (ReLie), by analogy to the classic Gaussian example described in (Kingma and Welling, 2013), as we can consider under addition as a Lie group with Lie algebra itself. In the remainder we build an intuition for our general theory drawing both from geometrical as well as measure theoretical concepts, concluded by stating our formal theorem.
3.1 Reparameterization Steps
The following reparameterization steps (a), (b), (c) are illustrated in Figure 1.
(a) We first sample from a reparameterizable distribution on . Since the Lie algebra is a real vector space, if we fix a basis this is equivalent to sampling a reparameterizable distribution from . In fact, the basis induces an isomorphism between the Lie algebra and (see Appendix G).
(b) Next we apply the exponential map to , to obtain an element, of the group. If the distribution is concentrated around the origin, then the distribution of will be concentrated around the group identity. In the Gaussian example on , this step corresponds to the identity operation, and . As this transformation is in general not the identity operation, we have to account for the possible change in volume using the change of variable formula555In a sense, this is similar to the idea underlying normalizing flows (Rezende and Mohamed, 2015). Additionally the exponential map is not necessarily injective, such that multiple points in the algebra can map to the same element in the group. We will have a more in depth discussion of both complications in the following subsection.
(c) Finally, to change the location of the distribution , we left multiply by another group element , applying the group specific operation. In the classic case this corresponds to a translation by . If the exponential map is surjective (like in all compact and connected Lie groups), then can also be parameterized by the exponential map666Care must be taken however when is predicted by a neural network to avoid homeomorphism conflicts as explored in
is predicted by a neural network to avoid homeomorphism conflicts as explored in(Falorsi et al., 2018; de Haan and Falorsi, 2018).
When trying to visualize the change in volume, moving from the Lie algebra space to that of the group manifold, we quickly reach the limits of our geometrical intuition. As concepts like volume and distance are no longer intuitively defined, naturally our treatment of integrals and probability densities should be reinspected as well. In mathematics these concepts are formally treated in the fields of differential and Riemannian geometry. To gain insight into building quantitative models of the above-mentioned concepts, these fields start from the local space behavior instead. This is done through the notion of the Riemannian metric, which formally corresponds to "attaching" to the tangent space at every point a scalar product . This allows to measure quantities like length and angles, and to define a local volume element, in small infinitesimal scales. Extrapolating from this approach we are now equipped to measure sets and integrate functions, which corresponds to having a measure on the space777We refer the interested reader to (Lee, 2012).. Notice that this measure arose directly from the geometric properties defined by the Riemannian metric. By carefully choosing this metric, we can endow our space with some desirable properties. A standard choice for Lie groups is to use a left invariant metric, which automatically induces a left invariant measure , called the Haar measure (unique up to a constant):
where is the set obtained by applying the group element to each element in the set . More intuitively, this implies that left multiplication doesn’t change volume.
Measure Theoretical Concepts
Perhaps a more natural way to view this problem comes from measure theory, as we’re trying to push a measure on , to a space with a possibly different topology. Whenever discussing densities such as in , it is implicitly stated that we consider a density w.r.t. the Lebesgue measure . What this really means is that we are considering a measure , absolutely continuous (a.c.) w.r.t. , written as 888See definition C, Appendix C. Critically, this is equivalent to stating there exists a density , such that
where is the Borel -algebra, i.e. the collection of all measurable sets. When applying the exponential map, we define a new measure on 999We can do this since the exponential map is differentiable, thus continuous, thus measurable., technically called the pushforward measure, 101010See definition C, Appendix C. However, already comes equipped with another measure , not necessarily equal to . Hence, if we consider a prior distribution that has a density on
, in order to compute quantities such as the Kullback-Leibler divergence we also need, meaning it has a density w.r.t. .
In the case the exponential map is injective, it can easily be shown that the pushforward measure has a density on 111111In fact, as discussed before we can always reduce to this case by defining a measure with limited support.. However, that these requirements are not necessarily fulfilled can be best explained through a simple example: Consider , s.t. , this function is clearly differentiable (see Fig. 2(a)). If we take a measure , with a Gaussian density, the pushforward of by is a Dirac delta, , for which it no longer holds that . Intuitively, this happens because is not injective since all points are mapped to , such that all the mass of the pushforward measure is concentrated on a single point.
Yet, this does not mean that all non-injective mappings can not be used. Instead, consider , s.t. with a constant, and as before (see Fig. 2(b)). Although is clearly not injective, for the pushforward measure by we still have . The key property here, is that it’s possible to partition the domain of into the sets . For the first two, we can now apply the change of variable method on each, as is injective when restricted to either. The zero set can be ignored, since it has Lebesgue measure 0. This partition-idea can be generally extended for Lie groups, by proving that the Lie algebra domain can be partitioned in a set of measure zero and a countable union of open sets in which the exponential map is a diffeomorphism. This insight proven in Lemma E.1, allows us to prove the general theorem:
Let , , , , be defined as above, then with density:
See Appendix E.2 ∎
Having verified the pushforward measure has a density, , the final step is to recenter the location of the resulting distribution. In practice, this is done by left multiplying the samples by another group element. Technically, this corresponds to applying the left multiplication map
Since this map is a diffeomorphism, we can again apply the change of variable method. Moreover, if the measure on the group is chosen to be the Haar measure, as noted before applying the left multiplication map leaves the Haar measure unchanged. In this case the final sample thus has density
Additionally, the entropy of the distribution is invariant w.r.t. left multiplication.
In this section we present general implementation details, as well as worked out examples for three interesting and often used groups: the n-Torus, , the oriented group of 3D rotations, , and the 3D rotation-translation group, . The worst case reparameterization computational complexity for a matrix Lie group of dimension can be shown121212See Appendix D. to be . However for many Lie groups closed form expressions can be derived, drastically reducing the complexity.
The term as appearing in the general reparameterization theorem 3.2, is crucial to compute the change of volume when pushing a density from the algebra to the group. Here, we given an intuitive explanation for Matrix Lie groups in dimensions with matrices of size . For a formal general explanation, we refer to Appendix D.
The image of the map is the dimensional manifold, Lie group , embedded in . An infinitesimal variation in the input around point , creates an infinitesimal variation of the output, which is restricted to the dimensional manifold . Infinitesimally this gives rise to a linear map between the tangent spaces at input and output. This is the Jacobian.
The change of volume is the determinant of the Jacobian. To compute it, we express the tangent space at the output in terms of the chosen basis of the Lie algebra. This is possible, since a basis for a Lie algebra provides a unique basis for the tangent space throughout . This can be computed analytically for any , since the map of matrix Lie groups is the matrix exponential, for which derivatives are computable. Nevertheless a general expression of
exists for any Lie Group and is given in terms of the complex eigenvalue spectrumof the adjoint representation of , which is a linear map:
Let be a Lie Group and its Lie algebra, then it can be shown that can be computed using the following expression
See Appendix D ∎
4.2 Three Lie Group Examples
The -Torus, :
The -Torus is the cross-product of times . It is an abelian (commutative) group, which is interesting to consider as it forms an important building block in the theory of Lie groups. The -Torus has the following matrix representation:
where . The basis of the Lie algebra is composed of block-diagonal matrices with blocks s.t. all blocks are except one that is equal to :
The exponential map is s.t. the pre-image can be defined from the following relationship :
The pushforward density is defined as
It can be observed that there is no change in volume. The resulting distribution on the circle or 1-Torus, which is also the Lie group SO(2), is illustrated in Appendix B.
The Special Orthogonal Group, :
The Lie group of orientation preserving three dimensional rotations has its matrix representation defined as
The elements of its Lie algebra
, are represented by the 3D vector space of skew-symmetricmatrices. We choose a basis for the Lie algebra:
This provides a vector space isomorphism between and , written as . Assuming the decomposition , s.t. , the exponential map is given by the Rodrigues rotation formula (Rodrigues, 1840)
Since is a compact and connected Lie group this map is surjective, however it is not injective. The complete preimage of an arbitrary group element can be defined by first using the principle branch operator to find the unique Lie algebra element next to the origin, and then observing the following relation
In practice, we will already have access to such an element of the Lie algebra due to the sampling approach. The pushforward density defined almost everywhere as
The Special Euclidean Group, :
This Lie group extends by also adding translations. Its matrix representation is given by
The Lie algebra, is similarly built concatenating a skew-symmetric matrix and a vector
A basis can easily be found combining the basis elements for and the canonical basis of . The exponential map from algebra to group is defined as
where is defined as in equation (4), and
From the expression of the exponential map it is clear that the preimage can be described similar to . Finally the pushforward density is defined almost everywhere as
where and such that . The can be easily defined from the in .
5 Related Work
Various work has been done in extending the reparameterization trick to an ever growing amount of variational families. Figurnov et al. (2018)
provide a detailed overview, classifying existing approaches into (1) findingsurrogate distributions, which in the absence of a reparameterization trick for the desired distribution, attempts to use an acceptable alternative distribution that can be reparameterized instead (Nalisnick and Smyth, 2017). (2) Implicit reparameterization gradients, or pathwise gradients, introduced in machine learning by Salimans et al. (2013), extended by Graves (2016), and later generalized by Figurnov et al. (2018) using implicit differentiation. (3) Generalized reparameterizations finally try to generalize the standard approach as described in the preliminaries section. Notable are (Ruiz et al., 2016), which relies on defining a suitable invertible standardization function to allow a weak dependence between the noise distribution and the parameters, and the closely related (Naesseth et al., 2017) focusing on rejection sampling.
All of the techniques above can be used orthogonal to our approach, by defining different distributions over the Lie algebra. While some even allow for reparameterizable densities on spaces with non-trivial topologies131313For example Davidson et al. (2018) reparameterize the von Mises-Fisher distribution which is defined on , with isomorphic to the Lie group ., none of them provide the tools to correctly take into account the volume change resulting from pushing densities defined on to arbitrary Lie groups. In that regard the ideas underlying normalizing flows (NF) (Rezende and Mohamed, 2015) are the closest to our approach, in which probability densities become increasingly complex through the use of injective maps. Two crucial differences however with our problem domain, are that the change of variable computation now needs to take into account a transformation of the underlying space, as well as the fact that the exponential map is generally not injective. NF can be combined with our work to create complex distributions on Lie groups, as is demonstrated in the next section.
Samples of the Variational Inference model and Markov Chain Monte Carlo of Experiment6.1. Outputs are shifted in the z-dimension for clarity.
Defining and working with distributions on homogeneous spaces, including Lie groups, was previously investigated in (Chirikjian and Kyatkin, 2000; Chirikjian, 2010; Wolfe and Mashner, 2011; Chirikjian, 2011; Chirikjian and Kyatkin, 2016; Ming, 2018). Barfoot and Furgale (2014) also discuss implicitly defining distributions on Lie groups, through a distribution on the algebra, focusing on the case of
. However, these works only consider the neighbourhood of the identity, making the exponential map injective, but the distribution less expressive. In addition, generally only Gaussian distributions on the Lie Algebra are used in past work.Cohen and Welling (2015) devised harmonic exponential families which are a powerful family of distributions defined on homogeneous spaces. These works all did not concentrate on making the distributions reparameterizable. Mallasto and Feragen (2018) defined a wrapped Gaussian process on Riemannian manifolds through the pushforward of the map without providing an expression for the density.
We conduct two experiments on to highlight the potential of using complex and multimodal reparameterizable densities on Lie groups141414See Appendix A for additional details..
where is an invertible Neural Network consisting of several coupling layers (Dinh et al., 2014), the
function is applied to the norm and a unit Gaussian is used as initial distribution. The hyperparameterdetermines the non-injectivity of the exp map and thus of the flow. must be chosen such that the image of is contained in the regular region of the map. For sufficiently small , the entire flow is invertible, but may not be surjective, while for bigger the flow is non-injective, with a finite inverse set at each , as is a local diffeomorphism and the image of has compact support. For details see Appendix E. For such a Locally Invertible Flow (LI-Flow), the likelihood evaluation requires us to branch at the non-injective function and traverse the flow backwards for each element in the preimage.
6.1 Variational Inference
In this experiment we estimate the 3 group actions that leave a symmetrical object invariant. This highlights how our method can be used in probabilistic generative models and unsupervised learning tasks. We have a generative modeland a uniform prior over the latent variable . Using Variational Inference we optimize the Evidence Lower Bound to infer an approximate posterior modeled with LI-Flow.
Results are shown in Fig. 4 and compared to Markov Chain Monte Carlo samples. We observe the symmetries are correctly inferred.
6.2 Maximum Likelihood Estimation
To demonstrate the versatility of the reparameterizable Lie group distribution, we learn supervised pose estimation by learning a multimodal conditional distribution using MLE, as in (Dinh et al., 2017).
We created data set: of objects rotated to pose and algebra noise samples . The object is symmetric for the subgroup corresponding to rotations of along one axis. We train a LI-Flow model by maximizing: . The results in Fig. 5 reveal that the LI-Flow successfully learns a multimodal conditional distribution.
In this paper we have presented a general framework to reparameterize distributions on Lie groups (ReLie), that enables the extension of previous results in reparameterizable densities to arbitrary Lie groups. Furthermore, our method allows for the creation of complex and multimodal distributions through normalizing flows, for which we defined a novel Locally Invertible Flow (LI-Flow) example on the group . We empirically showed the necessity of LI-Flows in estimating uncertainty in problems containing discrete or continuous symmetries.
This work provides a bridge to leverage the advantages of using deep learning to estimate uncertainty for numerous application domains in which Lie groups play an important role. In future work we plan on further exploring the directions outlined in our experimental section to more challenging instantiations. Specifically, learning rigid body motions from raw point clouds or modeling environment dynamics for applications in optimal control present exciting possible extensions.
The authors would like to thank Rianne van den Berg and Taco Cohen for suggestions and insightful discussions to improve this manuscript.
- Alexandrino and Bettiol (2015) Alexandrino, M. M. and Bettiol, R. G. (2015). Lie groups and geometric aspects of isometric actions. Cham: Springer.
- Barfoot and Furgale (2014) Barfoot, T. D. and Furgale, P. T. (2014). Associating uncertainty with three-dimensional poses for use in estimation problems. IEEE Transactions on Robotics, 30(3):679–693.
- Bonnet (1964) Bonnet, G. (1964). Transformations des signaux aléatoires a travers les systemes non linéaires sans mémoire. In Annales des Télécommunications, volume 19, pages 203–220. Springer.
- Caron and Traynor (2005) Caron, R. and Traynor, T. (2005). The zero set of polynomials. http://www1.uwindsor.ca/math/sites/uwindsor.ca.math/files/05-03.pdf.
- Chirikjian (2010) Chirikjian, G. S. (2010). Information-theoretic inequalities on unimodular lie groups. Journal of geometric mechanics, 2(2):119.
- Chirikjian (2011) Chirikjian, G. S. (2011). Stochastic Models, Information Theory, and Lie Groups, Volume 2: Analytic Methods and Modern Applications, volume 2. Springer Science & Business Media.
- Chirikjian and Kyatkin (2000) Chirikjian, G. S. and Kyatkin, A. B. (2000). Engineering applications of noncommutative harmonic analysis: with emphasis on rotation and motion groups. CRC press.
- Chirikjian and Kyatkin (2016) Chirikjian, G. S. and Kyatkin, A. B. (2016). Harmonic Analysis for Engineers and Applied Scientists: Updated and Expanded Edition. Courier Dover Publications.
- Cohen and Welling (2015) Cohen, T. S. and Welling, M. (2015). Harmonic exponential families on manifolds. ICML.
- Davidson et al. (2018) Davidson, T. R., Falorsi, L., Cao, N. D., Kipf, T., and Tomczak, J. M. (2018). Hyperspherical Variational Auto-Encoders. UAI.
- de Haan and Falorsi (2018) de Haan, P. and Falorsi, L. (2018). Topological constraints on homeomorphic auto-encoding. NIPS Workshop.
- Dinh et al. (2014) Dinh, L., Krueger, D., and Bengio, Y. (2014). Nice: Non-linear independent components estimation. ICLR.
- Dinh et al. (2017) Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2017). Density estimation using real nvp. ICLR.
- Duistermaat and Kolk (2000) Duistermaat, J. J. and Kolk, J. A. C. (2000). Lie groups. Berlin: Springer.
- Falorsi et al. (2018) Falorsi, L., de Haan, P., Davidson, T. R., De Cao, N., Weiler, M., Forré, P., and Cohen, T. S. (2018). Explorations in homeomorphic variational auto-encoding. ICML Workshop.
- Fan et al. (2015) Fan, K., Wang, Z., Beck, J., Kwok, J., and Heller, K. A. (2015). Fast second order stochastic backpropagation for variational inference. In NIPS, pages 1387–1395.
- Figurnov et al. (2018) Figurnov, M., Mohamed, S., and Mnih, A. (2018). Implicit reparameterization gradients. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31, pages 439–450. Curran Associates, Inc.
- Graves (2016) Graves, A. (2016). Stochastic backpropagation through mixture density distributions. arXiv preprint arXiv:1607.05690.
- Hall (2003) Hall, B. (2003). Lie Groups, Lie Algebras, and Representations: An Elementary Introduction. Graduate Texts in Mathematics. Springer.
- Hermann (1980) Hermann, R. (1980). Differential geometry, lie groups, and symmetric spaces (sigurdur helgason). SIAM Review, 22(4):524–526.
- Howe (1989) Howe, R. (1989). Review: Sigurdur helgason, groups and geometric analysis. integral geometry, invariant differential operators and spherical functions. Bull. Amer. Math. Soc. (N.S.), 20(2):252–256.
- Jordan et al. (1999) Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine learning, 37(2):183–233.
- Kingma and Welling (2013) Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. CoRR.
- Klenke (2014) Klenke, A. (2014). Probability Theory - A Comprehensive Course. Universitext. Springer, London, second edition.
- Lee (2010) Lee, J. (2010). Introduction to topological manifolds, volume 202. Springer Science & Business Media.
- Lee (2012) Lee, J. M. (2012). Introduction to smooth manifolds. Graduate texts in mathematics. Springer, New York, NY [u.a.], 2. ed. edition.
- Mallasto and Feragen (2018) Mallasto, A. and Feragen, A. (2018). Wrapped gaussian process regression on riemannian manifolds. CVPR.
- Milnor (1976) Milnor, J. (1976). Curvatures of left invariant metrics on lie groups. Advances in Mathematics, 21(3):293 – 329.
- Ming (2018) Ming, Y. (2018). Variational bayesian data analysis on manifold. Control Theory and Technology, 16(3):212–220.
- Mnih and Gregor (2014) Mnih, A. and Gregor, K. (2014). Neural variational inference and learning in belief networks. In ICML.
- Moler and Van Loan (2003) Moler, C. and Van Loan, C. (2003). Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM review, 45(1):3–49.
- Naesseth et al. (2017) Naesseth, C., Ruiz, F., Linderman, S., and Blei, D. (2017). Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms. AISTATS, pages 489–498.
Nalisnick and Smyth (2017)
Nalisnick, E. and Smyth, P. (2017).
Stick-breaking variational autoencoders.ICLR.
Paisley et al. (2012)
Paisley, J., Blei, D., and Jordan, M. (2012).
Variational bayesian inference with stochastic search.ICML.
- Price (1958) Price, R. (1958). A useful theorem for nonlinear devices having gaussian inputs. IRE Transactions on Information Theory, 4(2):69–72.
- Rezende and Mohamed (2015) Rezende, D. and Mohamed, S. (2015). Variational inference with normalizing flows. ICML, 37:1530–1538.
- Rezende et al. (2014) Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. ICML, pages 1278–1286.
- Rodrigues (1840) Rodrigues, O. (1840). Des lois géométriques qui régissent les déplacements d’un système solide dans l’espace: et de la variation des cordonnées provenant de ces déplacements considérés indépendamment des causes qui peuvent les produire. J Mathematiques Pures Appliquees, 5:380–440.
- Ruiz et al. (2016) Ruiz, F. R., Titsias RC AUEB, M., and Blei, D. (2016). The generalized reparameterization gradient. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R., editors, NIPS, pages 460–468. Curran Associates, Inc.
Salimans et al. (2013)
Salimans, T., Knowles, D. A., et al. (2013).
Fixed-form variational posterior approximation through stochastic linear regression.Bayesian Analysis, 8(4):837–882.
- Schreiber and Bartels (2018) Schreiber, U. and Bartels, T. (2018). volume form. http://ncatlab.org/nlab/show/volume%20form. Revision 10.
- Titsias and Lázaro-Gredilla (2014) Titsias, M. and Lázaro-Gredilla, M. (2014). Doubly stochastic variational bayes for non-conjugate inference. In ICML, pages 1971–1979.
Williams, R. J. (1992).
Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine learning, 8(3-4):229–256.
- Wolfe and Mashner (2011) Wolfe, K. C. and Mashner, M. (2011). Bayesian fusion on lie groups. Journal of Algebraic Statistics, 2(1).
Appendix A Experiments: Additional Details
a.1 A Small Note on Alternative Proxies
The main focus of the experiments in this work is to show how our framework enables the usage of reparameterizable distributions on arbitrary Lie groups in a probabilistic deep learning setting, which to the best of our knowledge is not possible with current alternatives. The experiments therefore represent typical prototypes of applications, which can now be tackled using a general approach. To avoid confusion, it might very well be possible to design specialized one-off solutions for learning distributions on specific Lie groups, however, in this paper we aim at providing a general framework for doing this task.
a.2 Supplementary Details on VI Experiment
In this proto-typical Variational Inference experiment we provide an intuitive example of the need for complex distributions in the difficult task of estimating which group actions of leave a symmetrical object invariant. For didactic purposes we take two ordered points, , and perform LI-Flow VI to learn the approximate posterior over rotations. We evaluate the learned distribution by comparing its samples to those of the true posterior obtained using the Metropolis-Hastings algorithm.
Results are shown in Fig. 4. As expected, the discovered distribution over group actions is a rotational subgroup, . Clearly, the learned approximate posterior almost perfectly matches the true posterior. Instead, using a simple centered distribution such as the pushforward of a Gaussian as the variational family, would make learning the observed topology problematic, as all probability mass would focus around a single rotation.
a.3 Supplementary Details on MLE Experiment
We generate a random vector
that has a linear Lie group action. Then we create a random variableuniformly distributed representing the pose and a noisy version with . We observe and need to predict . This corresponds to having noisy observations of an object from different poses and needing to estimate the pose . When the object is symmetrical, that is a subgroup exists such that for all , should have modes corresponding to the values in .
This is evaluated on . The object is taken to an element of the representation space of , as in (Falorsi et al., 2018). It is made symmetric by taking the average of . is taken to be the cyclic group of order 3 corresponding to rotations of along one axis. The results show in Figure 5 reveal that the LI Flow successfully learns complicated conditional distributions.
Appendix B Distributions on the Circle
As an example of how the reparametrizable distribution on Lie groups behaves in practice, we illustrate in Figure 6
the distribution that arises when a univariate Normal distribution is pushed forward to the Lie group, homeomorphic to the circle, with the exponential map.
Appendix C Prerequisites
[Absolutely continuous measures, see Klenke (2014)] Let be a measurable space, and two measures on . Then is said to be absolutely continuous with respect to , written as , iff for all we have that
[Density between two measures] Let be a measurable space, and two measures on . One says that has a density w.r.t. iff there is a measurable function such that for all we have:
It is knows (see Klenke (2014)) that a density (if existent) is unique up to a -zero measure and it is often denoted as:
[Radon-Nikodým, see Klenke (2014) Cor. 7.34] Let be a measurable space, and two -finite measures on . Then one has the equivalence:
[Pushforward measure] Let be a measure space, a measurable space, and let be a measurable map. Then the pushforward measure of along , in symbols , is defined as follows
[The standard measure on (pseudo-)Riemannian manifolds, see (Schreiber and Bartels, 2018)] Let
be a (pseudo-)Riemannian manifold with metric tensor. The standard measure on w.r.t. is in local (oriented) coordinates per definition given by the density w.r.t. the Lebesgue measure, where is the absolute value of the determinant of the matrix of in the local coordinates at point . Note that the standard measure w.r.t. always exists.
We are mainly interested in probability distributions on (pseudo-)Riemannian manifolds that have a density w.r.t. the standard measure (i.e. that are absolute continuous w.r.t. ).
Appendix D Change of Variables
Consider a dimensional Lie group and its Lie algebra . Then a scalar product on induces left invariant Riemannian metric on in the following way:
Where is the differential of the Left action by . Since we have now given to the Lie group a Riemannian manifold structure, we can endow with a regular Borel measure . Notice that from the construction of the metric is a left-invariant measure, this also called left Haar measure. The left Haar measure is unique up to a scaling constant, determined by the choice of scalar product. Also the scalar product in the Lie algebra induces a measure in 151515It is sufficient to consider with the Riemannian metric given by "copying" the scalar product at each point. This could be formalized considering itself a Lie group with respect to vector addition and repeating the same argument used for G that is invariant with respect to vector addition and unique up to a constant. The following Proposition gives a general formula for the change of variables in Riemannian manifolds: (Proposition 1.3 Howe (1989)) Let and be Riemannian manifolds and a diffeomorphism of M onto N. For let denote the absolute value of the determinant of the linear isomorphism when expressed in terms of any orthonormal bases. Then given a function :
if and denote the Riemannian measures on M and N, respectively
In order to change variables we therefore need an orthonormal basis for the tangent space at each one of the group elements .
Similarly as we built the Riemannian metric, this is given by the differential of the Left group action.
In fact given a basis of the Lie algebra, then a basis for is given by . If is orthonormal then is an orthonormal basis for considering endowed with the Riemannian metric defined in Equation 9:
Then with respect of this basis the matrix representation U of the differential of the exponential has entries:
Where the equality follows from (9) 161616Notice that here in the following derivations we identify the tangent space at a point of the Lie algebra with the Lie algebra itself. . From this equality it is clear that is equal to the matrix representation of the endomorphism with repect to the basis . Since the determinant an endomorphism is a quantity defined independently of the choice of the basis. The volume change term is independent on the choice of scalar product and metric and it is given by the determinant of the endomorphism that can be computed with respect of any basis of . 171717Notice that even if the formal construction uses an explicit choice of scalar product and basis the induced measures and are independent of this choice up to a scalar multiplicative constant. Moreover since the choice of the constant for automatically the constant for the change of volume term is completely independent from the choice of scalar product and basis, as showed above. Regardless of these considerations the density of the pushforward measure will in general dependent of the choice of basis and scalar product, an in depth discussion of this behaviour is given in Appendix G
Then Theorem 1.7 of Hermann (1980) gives a general expression of this endomorphism for every Lie group: (Theorem 1.7 of Hermann (1980)) Let be a Lie group with Lie algebra . The exponential mapping of the manifold into has the differential:
where is a formal expression to indicate the infinite power series .
Now simply by composing on the left each side of (12) with we have that:
Combining this expression with Proposition D we have the general expression for the change of variables in Lie groups:
Let and defined as above. Let an open set in which is a diffeomorphism. Let measurable function in and a measurable function in . Then we have:
When we can find all eigenvalues of the following theorem gives a closed form for .
Let be a Lie Group and its Lie algebra, then the expression
where is the spectrum of the operator, i.e. the set of its (complex) eigenvalues, i.e. the multiset of roots of the characteristic polynomial of the operator (in complex field), in which each element is repeated as many times as its algebraic multiplicity.
Let a matrix representation on a given basis of the endomorphism . Then we have:
where is the determinant in complex field. Formally this is the determinant applied to the complexification of the endomorphism. Now let such that where is the Jordan normal form of where is the diagonal matrix that has as entries elements of the spectrum of and is a nilpotent matrix. Then we have:
where the last equality follows from the fact that where is an another nilpotent matrix, and from the fact that the determinant of a triangular matrix depends only on the diagonal entries. Using the definition of we can then write:
Now if then . Else, if then ∎
d.1 Matrix Lie Groups
In the case of a matrix Lie group we can exploit the fact our group is embedded in to give an alternative way to compute a matrix representation of . This corresponds to what in the literature is known as the Left Jacobian
Here we show how we can derive the expression of from the formal framework described in the previous Sections, using the additional information given by the fact that we are in a matrix Lie group. This is done using the fact that at each point the tangent space can be identified with a subspace of the real matrices.
In fact let , considering as an open subset of then the canonical basis of induces the isomorphism . With this identification the diffe rential of the is a map from to and can be directly computed taking derivatives. The same holds for the differential of the left group action. Moreover the following Lemma shows that it corresponds to a matrix left multiplication. With this isomorphism we can see that the differential of left multiplication corresponds exactly to left matrix multiplication: Let and let the left action of then identifying both the tangent spaces with using the isomorphisms is the following function:
These considerations lead to the following result:
Now let be a matrix Lie group, a basis of the Lie algebra. Then the Lie algebra endomorphism has matrix representation with respect to :
Which is called the left-Jacobian. Where is the ismomorphism given by the basis .
Considering as embedded in Then the tangent space at each point can be identified with a vector subspace of .
Then given this identification, taking the quantities are real valued matrices and can be simply obtained deriving the expression of the exponential in each entry. Moreover we have where the equality is given by considering the left group action as the restriction of to and applying the Lemma D.1.This gives an explicit description on how the endomorphism acts on each vector of the basis. From this we can build its matrix representation . This gives us the thesis. ∎
Appendix E Pushforward Density
e.1 Preliminary Lemmata
[See (Duistermaat and Kolk, 2000) Cor. 1.5.4] For a Lie Group with algebra and exponential map , the set of singular points is the set:
where denotes the adjoint representation of the real Lie algebra as a linear operator on the complex vector space.
Let be a complex polynomial viewed as a function on the real vector space :
Then either is identically zero or the set of roots has Lebesgue measure zero in .
The problem is reduced to the real polynomial defined by
It has the same set of (real) roots as and is identically zero if and only if is. The statement then follows from the theorem of Okamoto. A simple proof can be found in (Caron and Traynor, 2005). ∎
For a Lie Group with algebra and exponential map , the set of singular points is closed and has Lebesgue measure 0.
is closed because it is the preimage of the closed set of the continuous function .
Let . is a polynomial in , because is linear and polynomial. can not be identically zero, as , because is a diffeomorphism in a neighbourhood of (see (Duistermaat and Kolk, 2000) 1.3.4). Thus, the set of roots of , namely , has Lebesgue measure zero. It follows that also has Lebesgue measure zero. ∎
(Sets of Lebesgue measure 0 on a Manifold) If M is a smooth n-manifold we say that a subset has measure zero in if for every smooth chart the subset has n-dimensional measure zero.
Let a smooth manifold. Then and open neighbourhood of there exists open neighbourhood of such that has Lebesgue measure .
Take a smooth chart such that . Let open set. Then is an open set in such that . Take then an open ball with such that . If define we have that is an open neighborhood of and that has measure in . Then Lemma 6.6 of Lee (2012) implies that has measure 0 in ∎
Let and smooth manifolds of the same dimension and a smooth map. Let . Then can be partitioned in such that has Lebesgue measure 0 and for every is an open set such that is a diffeomorphism.
We first show that is open: since is a local diffeomorphism at there exists a neighbourhood such that is a diffeomorphism. Then . This shows that thus is open. Therefore inherits a manifold structure from as a sub-manifold, meaning that is second countable, implying is Lindelöf (see (Lee, 2010), Thm. 2.50). This means that every open cover has a countable subcover.
For every consider , neighbourhood of such that