Statistics on the (compact) Stiefel manifold: Theory and Applications

07/31/2017 ∙ by Rudrasis Chakraborty, et al. ∙ 0

A Stiefel manifold of the compact type is often encountered in many fields of Engineering including, signal and image processing, machine learning, numerical optimization and others. The Stiefel manifold is a Riemannian homogeneous space but not a symmetric space. In previous work, researchers have defined probability distributions on symmetric spaces and performed statistical analysis of data residing in these spaces. In this paper, we present original work involving definition of Gaussian distributions on a homogeneous space and show that the maximum-likelihood estimate of the location parameter of a Gaussian distribution on the homogeneous space yields the Fréchet mean (FM) of the samples drawn from this distribution. Further, we present an algorithm to sample from the Gaussian distribution on the Stiefel manifold and recursively compute the FM of these samples. We also prove the weak consistency of this recursive FM estimator. Several synthetic and real data experiments are then presented, demonstrating the superior computational performance of this estimator over the gradient descent based non-recursive counter part as well as the stochastic gradient descent based method prevalent in literature.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Manifold-valued data have gained much importance in recent times due to their expressiveness and ready availability of machines with powerful CPUs and large storage. For example, these data arise as

rank-2 tensors

(manifold of symmetric positive definite matrices) [36, 39], linear subspaces (the Grassmann manifold) [50, 26, 23, 37], column orthogonal matrices (the Stiefel manifold) [50, 28, 14], directional data and probability densities (the hypersphere) [34, 47, 49, 25] and others. A useful method of analyzing manifold valued data is to compute statistics on the underlying manifold. The most popular statistic is a summary of the data, i.e., the Riemannian barycenter (Fréchet mean (FM)) [22, 31, 2], Fréchet median [4, 11] etc. However, in order to compute statistics of manifold-valued data, the first step involves defining a distribution on the manifold. Recently, authors in [44] have defined a Gaussian distribution on Riemannian symmetric spaces (or symmetric spaces). Some typical examples of symmetric spaces include the Grassmannian, the hypersphere etc. Several other researchers [13, 43] have defined a Gaussian distribution on the space of symmetric positive definite matrices. They called the distribution a “generalized Gaussian distribution” [13] and “Riemannian Gaussian distribution” [43] respectively.

In this work, we define a Gaussian distribution on a homogeneous space (a more general class than symmetric spaces). A key difficulty in defining the Gaussian distribution on a non-Euclidean space is to show that the normalizing factor in the expression for the distribution is a constant. In this work, we show that the normalizing factor in our definition of the Gaussian distribution on a homogeneous space is indeed a constant. Note that a symmetric space is a homogeneous space but not all homogeneous spaces are symmetric and thus, our definition of Gaussian distribution is on a more generalized topological space than the symmetric space. Given a well-defined Gaussian distribution, the next step is to estimate the parameters of the distribution. In this work, we prove that the maximum likelihood estimate (MLE) of the mean of the Gaussian distribution is the Fréchet mean (FM) of the samples drawn from the distribution.

Data with values in the space of column orthogonal matrices have become popular in many applications of Computer Vision and Medical Image analysis

[50, 9, 33, 8, 40]. The space of column orthogonal matrices is a topological space, and moreover one can equip this space with a Riemannian metric which in turn makes this space a Riemannian manifold, known as the Stiefel manifold. The Stiefel manifold is a homogeneous space and here we extend the definition of the Gaussian distribution to the Stiefel manifold. In this work, we restrict ourselves to the Stiefel manifold of the compact type, which is quite commonly encountered in most applications mentioned earlier.

We now motivate the need for a recursive FM estimator. In this age of massive and continuous streaming data, samples are often acquired incrementally. Hence, from an applications perspective, the desired algorithm should be recursive/inductive in order to maximize computational efficiency and account for availability of data, requirements that are seldom addressed in more theoretically oriented fields. We propose an inductive FM computation algorithm and prove the weak consistency of our proposed estimator. FM computation on Riemannian manifolds has been an active area of research for the past few decades. Several researchers have addressed this problem and we refer the reader to [6, 2, 24, 38, 3, 35, 5, 41, 19, 4, 48, 29, 10, 45].

1.1 Key Contributions

In summary, the key contributions of this paper are: (i) A novel generalization of Gaussian distributions to homogeneous spaces. (ii) A proof that the MLE of the location parameter of this distribution is “the” FM. (iii) A sampling technique for drawing samples from this generalized Gaussian distribution defined on a compact Stiefel manifold (which is a homogeneous space), and an inductive/recursive FM estimator from the drawn samples along with a proof of its weak consistency. Several examples of FM estimates computed from real and synthetic data are shown to illustrate the power of the proposed methods.

Though researchers have defined Gaussian distributions on other manifolds in the past, see [44, 13], their generalization of the Gaussian distribution is restricted to symmetric spaces of non-compact types. In this work, we define a Gaussian distribution on a homogeneous space, which is a more general topological space than the symmetric space. A few others in literature have generalized the Gaussian distribution to all Riemannian manifolds, for instance, in [52], authors defined the Gaussian distribution on a Riemannian manifold without a proof to show that the normalizing factor is a constant. Whereas, in [38]

, the author defined the normal law on Riemannian manifolds using the concept of entropy maximization for distributions with known mean and covariance. Under certain assumptions, the author shows that this definition amounts to using the Riemannian exponential map on a truncated Gaussian distribution defined in the tangent space at the known intrinsic mean. This approach of deriving the normal distribution yields a normalizing factor that is dependent on the location parameter of the distribution and hence is not a constant with respect to the FM. To the best of our knowledge, we are the first to define a Gaussian distribution on a homogeneous space with a constant normalizing factor, i.e., the normalizing factor does not depend on the location parameter (FM).

We then move our focus to the Stiefel manifold (which is a homogeneous space) and propose a simple algorithm to draw samples from the Gaussian distribution on the Stiefel manifold. In order to achieve this, we develop a simple but non-trivial way to extend the sampling algorithm in [44] to get samples on the Stiefel manifold. Once we have the samples from a Gaussian distribution on the Stiefel, we propose a novel estimator of the sample FM and prove the weak consistency of this estimator. The proposed FM estimator is inductive in nature and is motivated by the inductive FM algorithm on the Euclidean space. But, unlike Euclidean space, due to the presence of non-zero curvature, it is necessary to prove the consistency of our proposed estimator, which is presented subsequently. Further, we experimentally validate the superior performance of our proposed FM estimator over the gradient descent based techniques. Moreover, we also show that the MLE of the location parameter of the Gaussian distribution on the Stiefel manifold asymptotically achieves the Cramér-Rao lower bound [15, 42]

, hence in turn, the MLE of the location parameter is efficient. This implies that our proposed consistent FM estimator, asymptotically, has a variance lower bounded by that of the MLE.

The rest of the paper is organized as follows. In section 2, we present the necessary mathematical background. In section 3, we define a Gaussian distribution on a homogeneous space. More specifically, define a generalized Gaussian distribution on the Stiefel manifold and prove that the normalizing factor is indeed a constant with respect to the location parameter of the distribution. Then, we propose a sampling algorithm to draw samples from this generalized Gaussian distribution in section 3.1 and in section 3.2, show that the MLE of the location parameter of this Gaussian distribution is the FM of the samples drawn from the distribution. In section 4, we propose an inductive FM estimator and prove its weak consistency. Finally we present a set of synthetic and real data experiments in section 5 and draw conclusions in section 6.

2 Mathematical Background: Homogeneous spaces and the Riemannian symmetric space

In this section, we present a brief note on the differential geometry background required in the rest of the paper. For a detailed exposition on these concepts, we refer the reader to a comprehensive and excellent treatise on this topic by Helgason [27]. Several propositions and lemmas that are needed to prove the results in the rest of the paper are stated and proved here. Some of these might have been presented in the vast differential geometry literature but are unknown to us and hence the proofs presented in this background section are original.

Let be a Riemannian manifold with a Riemannian metric , i.e., is a bi-linear symmetric positive definite map, where is the tangent space of at . Let be the metric (distance) induced by the Riemannian metric . Let be the set of all isometries of , i.e., given , , for all . It is clear that forms a group (henceforth, we will denote by ) and thus, for a given and , , for some is a group action. Consider , and let , i.e., is the Stabilizer of . We say that acts transitively on , iff, given , there exists a such that .

Definition 2.1.

Let act transitively on and , (called the “origin” of ) be a subgroup of . Then, is a homogeneous space and can be identified with the quotient space under the diffeomorphic mapping [27].

In fact, if is a homogeneous space, then is a Lie group. A Stiefel manifold, (definition of the Stiefel manifold is given in next section) is a homogeneous space and can be identified with , where is the group of orthogonal matrices. Now, we will list some of the important properties of Homogeneous spaces that will be used throughout the rest of the paper.

Properties of Homogeneous spaces: Let be a Homogeneous space. Let be the corresponding volume form and be any integrable function. Let , s.t. , . Then, the following facts are true:

  1. .

  2. , for all .

Definition 2.2.

A Riemannian symmetric space is a Riemannian manifold with the following property: such that and . is called symmetry at [27].

Proposition 2.1.

[27] A symmetric space is a homogeneous space with a symmetry, , at . For the other point , by transitivity of , there exists such that and .

Proposition 2.2.

[27] Any symmetric space is geodesically complete.

Some examples of symmetric spaces include, (the hypersphere), (the hyperbolic space) and (the Grassmannian). It is evident from the definition that symmetric space is a homogeneous space but the converse is not true. For example, the Stiefel manifold is not a symmetric space.

Proposition 2.3.

[27] The mapping is an involutive automorphism of and the stabilizer of , i.e., , is contained in the group of fixed points of .

Clearly, , as is an automorphism, is the identity element. Recall, is a Lie group, hence, differentiating at , we get an involutive automorphism of the Lie algebra of (also denoted by ). Henceforth, we will use to denote the automorphism of . Since is involutive, i.e., , has two eigen values, and let (Lie algebra of ) and

be the corresponding eigenspaces, then

(direct sum).

Proposition 2.4.

[27] , and

Hence, is a Lie subalgebra of . Henceforth, we will assume to be semisimple. We can define a symmetric, bilinear form, on as follows , where is the adjoint endomorphism of defined by . is called the Killing form on .

Definition 2.3.

The decomposition of as is called the Cartan decomposition of associated with the involution . Furthermore, is negative definite on , positive definite on and and are orthogonal complement of each other with respect to on .

Recall, a symmetric space, , can be identified with . Note that, , the “origin” of can be written as , is the identity element. Since, can be identified with , the Riemannian metric on corresponds to the Killing form on [27], which is a -invariant form. Without loss of generality, we will assume that is over and be semisimple (equivalently, the Killing form on is non-degenerate). The symmetric space is said to be compact (noncompact) iff the sectional curvature is strictly positive (negative), equivalently iff is compact (noncompact).

Duality: Given a semisimple Lie algebra with the Cartan decomposition , construct another Lie algebra from as follows: , where is a complex structure of (real Lie algebra). From the definition of complex structure, , is an automorphism on s.t., . satisfies the following equality: , for all . We will call the dual Lie algebra of . It is easy to see that if corresponds to a symmetric space of noncompact type, is a symmetric space of compact type and vice-versa. This duality property is very useful and is a key ingredient of this paper.

Now, we will briefly describe the geometry of two Riemannian manifolds, namely the Stiefel manifold and the Grassmannian. We need the geometry of Stiefel manifold throughout the rest of the paper. Furthermore, observe that, the Stiefel and the Grassmannian form a fiber bundle. In order to draw samples from a distribution on the Stiefel, we will use the samples drawn from a distribution on the Grassmannian by exploiting the fiber bundle structure. Hence, we will require the geometry of the Grassmannian as well, which we will briefly present below.

Differential Geometry of the Stiefel manifold: The set of all full column rank dimensional real matrices form a Stiefel manifold, , where . A compact Stiefel manifold is the set of all column orthonormal real matrices. When , can be identified with , where is special orthogonal group. Note that, when we consider the quotient space, , we assume that is a subgroup of , where, defined by is an isomorphism from to .

Proposition 2.5.

is a closed Lie-subgroup of . Moreover, the quotient space together with the projection map, is a principal bundle with as the fiber.

Proof.

is a compact Lie-subgroup of , hence is a closed subgroup. The fiber bundle structure of follows directly from the closedness of . As is a principal homogeneous space (because and acts on it freely), hence the principal bundle structure. ∎

With a slight abuse of notation, henceforth, we denote the compact Stiefel manifold by . Hence, , where is the identity matrix. The compact Stiefel manifold has dimension . At any , the tangent space is defined as follows . Now, given , the canonical Riemannian metric on is defined as follows:

(2.1)

With this metric, the compact Stiefel manifold has non-negative sectional curvature [53].

Given , we can define the Riemannian retraction and lifting map within an open neighbourhood of . We will use an efficient Cayley type retraction and lifting maps respectively on as defined in [21, 30]. It should be mentioned that though the domain of retraction is a subset of the domain of inverse-Exponential map, on retraction/ lifting is a useful alternative since, there are no closed form expressions for both the Exponential and the inverse-Exponential maps on Stiefel manifold. Recently, a fast iterative algorithm to compute Riemannian inverse-Exponential map has been proposed in [54], which can be used instead of retraction/ lifting maps to compute FM in our algorithm.

In the neighborhood of ( matrix with upper-right block is identity and rest are zeros), given , we define the lifting map by where, is a skew-symmetric matrix and is a matrix defined as follows: and where, , and with , and , provided that is nonsingular. is defined as and, .

Furthermore, in the neighborhood of , the retraction map defined above is a diffeomorphism (since it is a chart map) from to .

Proposition 2.6.

The projection map is a covering map on the neighborhood of in .

Proof.

First, note that under the identification of with , the neighborhood of in can be identified with the neighborhood of in . Now, the retraction map defined above is a (local) diffeomorphism from to . Also, the Cayley map is a diffeomorphism from to the neighborhood of in . Thus, the map is a diffeomorphism to the neighborhood of in (using the fact that the composition of two diffeomorphisms is a diffeomorphism). Now, since is compact and is surjective, is a covering map on the neighborhood of in using the following Lemma. ∎

Lemma 2.1.

Under the hypothesis in Proposition 2.6, is a covering map in the neighborhood from the neighborhood of in to the neighborhood of of .

Proof.

In Proposition 2.6, we have shown that is local diffeomorphism in the neighborhood specified in the hypothesis. Let be a neighborhood around in and be a neighborhood around in on which is a diffeomorphism. Let , as is a space, hence, is closed, thus, is closed and since is compact, hence is compact. For each , let be a open neighborhood around where restricts to a diffeomorphism (and hence homeomorphism). Then, is an open cover of , thus as a finite subcover , where is finite. We chose to be disjoint as is a space. Let which is an open neighborhood of . Then, is a disjoint collection of open neighborhoods each of which maps homeomorphically to . Hence, is a covering map in the local neighborhood. ∎

Given , the Cayley map is a conformal mapping, defined by . Using the Cayley mapping, we can define the Riemannian retraction map by . Hence, given within a regular geodesic ball (the geodesic ball does not include the cut locus) of appropriate radius (henceforth, we will assume the geodesic ball to be regular), we can define the unique geodesic from to , denoted by as

(2.2)

Also, we can define the distance between and as

(2.3)

Differential Geometry of the Grassmannian : The Grassmann manifold (or the Grassmannian) is defined as the set of all -dimensional linear subspaces in and is denoted by , where , , . Grassmannian is a symmetric space and can be identified with the quotient space , where is the set of all matrices whose top left and bottom right submatrices are orthogonal and all other entries are , and overall the determinant is . A point can be specified by a basis, . We say that if is a basis of , where is the column span operator. It is easy to see that the general linear group acts isometrically, freely and properly on . Moreover, can be identified with the quotient space . Hence, the projection map is a Riemannian submersion, where . Moreover, the triplet is a fiber bundle.

At every point , we can define the vertical space, to be . Further, given , we define the horizontal space, to be the -orthogonal complement of

. Now, from the theory of principal bundles, for every vector field

on , we define the horizontal lift of to be the unique vector field on for which and , . As, is a Riemannian submersion, the isomorphism is an isometry from to . So, is defined as:

(2.4)

where, and , , and .

3 Gaussian distribution on Homogeneous spaces

In this section, we define the Gaussian distribution, on a Homogeneous space, , (location parameter), (scale parameter), and then propose a sampling algorithm to draw samples from the Gaussian distribution on . Furthermore, we will show that the maximum likelihood estimator (MLE) of is the Fréchet mean (FM) [22] of the samples.

We define the probability density function,

with respect to (the volume form) of the Gaussian distribution on as:

(3.1)

The above is a valid probability density function, provided the normalizing factor, is a constant, i.e., does not depend on which we will prove next.

Proposition 3.1.

Let us define . Then, , where is the origin.

Proof.

As the group action on is transitive, there exists s.t., .

Hence, , i.e., does not depend on . ∎

Now that we have a valid definition of a Gaussian distribution, on a Homogeneous space, we propose a sampling algorithm for drawing samples from on (which is a homogeneous space), .

3.1 Sampling algorithm

In order to draw samples from on , it is sufficient to draw samples from where is the origin. Then, using group operation, we can draw samples from for any . We will assume, ( matrix with the upper-right block being the identity and the rest being zeros). We will first draw samples from on , where and use this sample to get a sample on using . Note that is a symmetric space and hence a homogeneous space and thus we have a valid Gaussian density on using Eq. 3.1.

Proposition 3.2.

Let where , . Then, , with , where, and .

Proof.

It is sufficient to show that . Recall, that forms a fiber bundle. Moreover, the isomorphism, is an isometry from to , for all . From, [1], we know that . So,

Using the Proposition 3.2, we can generate a sample from on , using a sample from on . We will now propose an algorithm to draw samples from on . Recall that can be identified as which is a semisimple symmetric space of compact type (let it be denoted by ). Also, recall from section 2 that, every compact semisimple symmetric space has a dual semsimple symmetric space of non-compact type (denoted by ). Here, , , where , , , . Then, , and the corresponding Lie group, denoted by (with a slight abuse of notation we use to denote the identity component). Here, is the special pseudo-orthogonal group, i.e.,

,

. Thus, the dual non-compact type symmetric space of (identified with ) is . Recently, in [44], an algorithm to draw samples from a Gaussian distribution on symmetric spaces of non-compact type was presented. We will use the following proposition to get a sample from a Gaussian distribution on the dual compact symmetric space.

Proposition 3.3.

Let . Let , where Ad is the adjoint representation, , . Then, , where and .

Proof.

Observe that . So, it suffices to show that .

Note that, the mapping given by is a diffeomorphism and is used to construct from . The mapping is called the polar coordinate transform. Now, using Propositions 3.2 and 3.3, starting with a sample drawn from a Gaussian distribution on , we get a sample from Gaussian distribution on . We would like to point out that, we do not have to compute the normalizing constant explicitly in order to draw samples, because, in order to get samples on , we can draw samples using the Algorithm 1 in [44], which draws samples from the kernel of the density.

3.2 Maximum likelihood estimation (MLE) of

Let, be i.i.d. samples drawn from with bounded support (described subsequently) on , for some . Then, by proposition 3.5, the MLE of is the Fréchet mean (FM) [22] of . Fréchet mean (FM) [22] of is defined as follows:

(3.2)

We define an (open) “geodesic ball” of radius to be s.t., there exists a length minimizing geodesic between to any . A “geodesic ball” is said to be “regular” iff , where is the maximum sectional curvature. The existence and uniqueness of the Fréchet mean (FM) is ensured iff the support of the distribution is within a regular geodesic ball [2, 32].

Proposition 3.4.

Let , , then,

Proof.

Let, . Then, there exists a unique s.t. , . [51]. Now, using O’Neil’s formula [12], we know that

where, is the orthogonal projection operator on . Clearly as the second term in the above summation is non-negative and is non-negative (as is of compact type), the result follows. ∎

Observe that, the support of as defined in proposition 3.2 is a subset of , is an arbitrary union of open sets and hence is open. Thus, we can give a manifold structure and using the proposition 3.4, we can say that if the support of is within a geodesic ball , FM exists and is unique. For the rest of the paper, we assume this condition to ensure the existence and uniqueness of FM.

Proposition 3.5.

Let, be i.i.d. samples drawn from on (support of is within a geodesic ball ), . Then the MLE of is the FM of .

Proof.

The likelihood of given the i.i.d. samples is given by

(3.3)

where is defined as in Eq. 3.1. Now, maximizing log-likelihood function with respect to is equivalent to minimizing with respect to . This gives the MLE of to be the FM of as can be verified using Eq. 3.2. ∎

4 Inductive Fréchet mean on the Stiefel manifold

In this section, we present an inductive formulation for computing the Fréchet mean (FM) [22, 31] on Stiefel manifold. We also prove the Weak Consistency of our FM estimator on the Stiefel manifold.

Algorithm for Inductive Fréchet Mean Estimator

Let , , be i.i.d. samples drawn from (whose support is within a geodesic ball ) on . Then, we define the inductive FM estimator (StiFME) by the recursion in Eqs. 4.1, 4.2.

(4.1) (4.2)
 
 
where, is the geodesic from to defined as and . Eq. 4.2 simply means that the estimator lies on the geodesic between the estimate and the sample point. This simple inductive estimator can be shown to converge to the Fréchet expectation, i.e., , as stated in Theorem 4.1.

Theorem 4.1.

Let, be i.i.d. samples drawn from a Gaussian distribution on (with a support inside a regular geodesic ball of radius ). Then the inductive FM estimator (StiFME) of these samples, i.e., converges to as .

Proof.

We will start by first stating the following propositions.

Proposition 4.1.

Using Proposition 2.5., we know that is a principal bundle and moreover using Proposition 2.6, we know that this map is a covering map in the neighborhood of in . Let, be the Riemannian metric on and be the metric on the quotient space . Then, .

Proposition 4.2.

Let, , where and . Let, is an defined in Eq. 3.2, then, , where .

Proof.

Let, , for some . Then, observe that,

Thus the claim holds. ∎

By the Proposition 4.2, we can see that in order to prove Theorem 4.1, it is sufficient to show weak consistency on . We will state and prove the weak consistency on in the next theorem. ∎

Theorem 4.2.

Using the hypothesis in Theorem 4.1, let be the corresponding i.i.d. samples drawn from the (induced) Gaussian distribution on where () (it is easy to show using Proposition 4.2 that this (induced) distribution on is indeed a Gaussian distribution on ). Then the inductive FM estimator (StiFME) of these samples, i.e., converges to as .

Proof.

Since is a special case of the (compact) Stiefel manifold, i.e., when (as can be identified with ), we will use instead of for notational simplicity. Let . Any point in can be written as a product of planar rotation matrices by the following claim.

Proposition 4.3.

Any arbitrary element of can be written as the composition of planar rotations in the planes generated by the standard orthogonal basis vectors of .

Proof.

The proof is straightforward. Moreover, each element of is a product of planar rotations. ∎

By virtue of the Proposition 4.3, we can express as a product of planar rotation matrices. Each planar rotation matrix can be mapped onto , hence . Let’s denote this product space of hyperspheres by . Then, is a diffeomorphism from to . Let be a Riemannian metric on