# Geometrical and statistical properties of M-estimates of scatter on Grassmann manifolds

We consider data from the Grassmann manifold G(m,r) of all vector subspaces of dimension r of R^m, and focus on the Grassmannian statistical model which is of common use in signal processing and statistics. Canonical Grassmannian distributions G_Σ on G(m,r) are indexed by parameters Σ from the manifold M= Pos_sym^1(m) of positive definite symmetric matrices of determinant 1. Robust M-estimates of scatter (GE) for general probability measures P on G(m,r) are studied. Such estimators are defined to be the maximizers of the Grassmannian log-likelihood -ℓ_P(Σ) as function of Σ. One of the novel features of this work is a strong use of the fact that M is a CAT(0) space with known visual boundary at infinity ∂M. We also recall that the sample space G(m,r) is a part of ∂M, show the distributions G_Σ are SL(m,R)--quasi-invariant, and that ℓ_P(Σ) is a weighted Busemann function. Let P_n =(δ_U_1+...+δ_U_n)/n be the empirical probability measure for n-samples of random i.i.d. subspaces U_i∈ G(m,r) of common distribution P, whose support spans R^m. For Σ_n and Σ_P the GEs of P_n and P, we show the almost sure convergence of Σ_n towards Σ as n→∞ using methods from geometry, and provide a central limit theorem for the rescaled process C_n = m/tr(Σ_P^-1Σ_n)g^-1Σ_n g^-1, where Σ =gg with g∈ SL(m,R) the unique symmetric positive-definite square root of Σ.

## Authors

• 2 publications
• 2 publications
• ### Asymptotic Distributions for Likelihood Ratio Tests for the Equality of Covariance Matrices

Consider k independent random samples from p-dimensional multivariate no...
10/05/2021 ∙ by Wenchuan Guo, et al. ∙ 0

• ### Density estimation and modeling on symmetric spaces

In many applications, data and/or parameters are supported on non-Euclid...
09/04/2020 ∙ by Didong Li, et al. ∙ 0

• ### An Empirical Bayes Approach to Shrinkage Estimation on the Manifold of Symmetric Positive-Definite Matrices

The James-Stein estimator is an estimator of the multivariate normal mea...
07/04/2020 ∙ by Chun-Hao Yang, et al. ∙ 0

• ### Expanding the Family of Grassmannian Kernels: An Embedding Perspective

Modeling videos and image-sets as linear subspaces has proven beneficial...
07/04/2014 ∙ by Mehrtash T. Harandi, et al. ∙ 0

• ### Gaussian distributions on Riemannian symmetric spaces in the large N limit

We consider Gaussian distributions on certain Riemannian symmetric space...
02/15/2021 ∙ by Simon Heuveline, et al. ∙ 0

• ### k-means on a log-Cholesky Manifold, with Unsupervised Classification of Radar Products

We state theoretical properties for k-means clustering of Symmetric Posi...
08/08/2020 ∙ by Daniel Fryer, et al. ∙ 14

• ### Cube root weak convergence of empirical estimators of a density level set

Given n independent random vectors with common density f on ℝ^d, we stud...
06/02/2020 ∙ by Philippe Berthet, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction and summary

The deluge of current data in science, social sciences and technology is remarkable for the proliferation of new data type, and practitioners are increasingly faced with the geometries they induce. We focus on data from the Grassmann manifold  of all vector subspaces of dimension of (). Such data arise for example in signal processing (see, e.g., [Nokleby], [Zhang]). Let be i.i.d. random vectors in

with central normal distribution of positive definite self-adjoint covariance matrix

. The density of the normal law is () up to a constant factor. We define the Grassmannian distribution of parameter as the law of the linear span of these vectors in . It is a Borel probability measure on . The parameter of a Grassmannian distribution is defined up to a positive factor only. This indeterminacy is removed by requiring the determinant of to be . So, we parametrize the Grassmannian distributions by the space of positive definite self-adjoint matrices of determinant .

Given a regular matrix , the random vectors are i.i.d. with central normal law of covariance matrix . Hence, the image measure of under the transformation of given by for is

 AGΣ=GAΣAT. (1)

Let us represent a point as the linear span

of or, equivalently, as the range of the matrix of rank . Then, a computation shows that the density, or Radon–Nikodym derivative, of the Grassmannian distribution (

) with respect to the uniform distribution

on ( = identity matrix) is given by

 dGΣdGIdm(⟨X⟩)=(det(XTX)det(XTΣ−1X))m/2 (2)

(see [chi]).

When , the Grassmannian distribution is known as the angular Gaussian distribution of parameter on the projective space (see [AMR2005]). For any , the Grassmann manifold  can be viewed as the space of projective subspaces of dimension of by identifying a vector -subspace of with the projective subspace . In this projective interpretation, the Grassmannian distribution on is the law of the projective span of i.i.d. random points of

with angular Gaussian distribution of parameter

.

Let be a Borel probability measure on . Typically, we think of as being the empirical measure of a sample in . A parameter is called a Grassmannian M-estimate of scatter —abbreviated GE in the sequel— of if it maximizes the log-likelihood . It is called a GE of a sample when is the sample empirical measure.

For convenience, we shall rather work with the following negative version of the log-likelihood

 ℓP(Σ) =−1m∫G(m,r)log(dGΣ/dGIdm)dP=∫G(m,r)ℓU(Σ)dP(U), (3) where the (negative) log-density ℓU is defined by ℓU(Σ) =−1mlogdGΣdGIdm(U)=12logdet(XTΣ−1X)det(XTX)(U=⟨X⟩∈G(m,r)). (4)

With this notation, a GE of minimizes .

When , GE is known as Tyler’s M-estimator of scatter and is a special case of Maronna’s affine invariant M-estimator [Maronna, Ty]

. It has the desirable property of being robust to outliers in the Huber sense

[Hub], and is thus particularly suitable for handling big data. Tyler [Ty] proved that it is the most robust estimator of covariance for elliptical distributions. The existence and unicity of GE has been studied in [Ty] and [Kent], see also [Lutz] for more results on such M-estimators. The authors of [AMR2005] studied such questions using the geometry of the parameter space , which is a classical Riemannian manifold when endowed with a statistically meaningful Riemannian metric. Within this framework, it turns out that the M-functional evaluated on geodesics of is convex. The authors of [AMR2005] then obtained precise results on existence and unicity of GE by studying the behaviour at infinity of . Similar approaches were then developed within the signal processing literature [Wiesel, ZWG], and in statistics [DT] where a full treatment of the role of the Riemannian manifold structure of is given.

For general , the authors of [AMR] studied the existence and uniqueness of GE using the sample space as a subset of the boundary of the parameter space , as observed by [FR] when . We exploit here more deeply this relation, which is a consequence of the fact that is a CAT(0) space, with well known visual boundary at infinity.

The article is structured as follows. Section 2 recalls the basic notions from the Riemannian geometry of the parameter space . Section 3 shows that Grassmannian distributions correspond to quasi-invariant distributions on , where is a maximal parabolic subgroup of . Using these notions, Section 3.4.2 computes the so-called functions to determine quasi-invariant distributions on subsets of . This geometrical view point extends old results on invariant measures on Stiefel and Grasssmann manifolds, see, e.g. [James, Herz, Muir, chi], to quasi-invariant measures. Furthermore, it is shown that is a weighted Busemann function, Section 4 studies the gradient and covariant derivative of , and Section 5 proves that under suitable conditions is a strictly geodesically convex function. Then, general results of Kapovich–Leeb–Millson [KLM] on Hadamard manifolds yield that a Borel probability measure on the Grassmannian has a unique GE if and only if

 ∫G(m,r)dim(U∩V)dP(U)

for all nontrivial linear subspaces of (). For a sample of size , let be its corresponding empirical probability measure. Then by [AMR, Theorem 4] for almost all samples of size the corresponding map has a unique GE. When a probability measure or a sample does not have a unique GE, it may have several GEs or no GEs. Section 6 considers geodesic coercivity using direct geometrical methods, and treats in detail the existence and unicity of GEs. Related to this, Corollary 5.10 shows that a parameter is a GE of a Borel probability measure  on if and only if it satisfies the M-equation

 ∫G(m,r)Pr(U,Σ)dP(U)=rmIdm, (6)

where is the -orthogonal projector on for the scalar product , , .

Section 7 focus on the almost sure convergence of GEs. More precisely, let be a random sample of i.i.d. random subspaces of , distributed according to a continuous Borel probability measure on whose support spans . Let be the corresponding empirical probability measures. Suppose the GEs for , respectively, exist. Theorem 7.1 shows that for every large enough the is unique almost surely, is unique, and converges almost surely to .

Section 8 considers the asymptotic normality of GE. With the above notation, let with the unique symmetric positive-definite square root of , and set . If the support of the continuous Borel probability measure spans then Theorem 8.1 shows that

 √n(Vec(Cn−Idm))n→∞−−−−−−−→distributionN(0,σ2∞),

where the limiting covariance matrix is obtained using geometrical arguments.

## 2 The geometry of the parameter space M=Pos1sym(m)

### The symmetric space of SL(m,R)

Let and consider the semi-simple real Lie group . It is a locally compact topological group that is also connected and with finite centre. Its associated symmetric space is , where is the special orthogonal group and is the maximal compact subgroup of . The symmetric space is a Riemannian manifold and it is a space (see [BH99, Chapter II.10]); in this section we want to precisely identify it. For more details and the proofs of this section, one can consult [BH99, Chapter II.10].

Let

 Sym(m):={M is an m×m matrix |M=MT}.

Notice is a vector space isomorphic to whose basis are the matrices having on the entries and and otherwise. In particular, is the matrix having on the entry and zero otherwise. On we consider the max-norm denoted for what follows by , i.e., , for every .

Because is isomorphic to , thus a smooth manifold, the tangent space at every matrix is of dimension . Then the max-norm topology agrees with the manifold topology on .

Moreover,

 Possym(m):={M∈Sym(m)|xMxT>0,∀x≠0∈Rm}

is an open subset of , and so every matrix has the tangent space of dimension .

It is a fact

 Pos1sym(m):={M∈Possym(m)|det(M)=1}

is a totally geodesic submanifold of and the tangent space at every point is of dimension . Moreover, one has .

###### Lemma 2.1.

The group acts on by , for every and every . Moreover, acts transitively on .

Now we want to define a Riemannian metric on such that acts on by isometries. To do that it is enough to define a scalar product on the tangent space at the identity . Then, by the transitivity of on we transport that scalar product on the tangent space at every point of .

###### Lemma 2.2.

The tangent space at the identity matrix

is given by

 TIdmPos1sym(m):={M∈Sym(m)|tr(M)=0}

and has dimension .

###### Definition 2.3.

On we define the scalar product

 ⟨,⟩Idm:TIdmPos1sym(m)×TIdmPos1sym(m)→R

by , for every .

###### Remark 2.4.

Let be a curve with . Then for every , is a curve with . So

where .

In particular, notice .

For want follows we want to define a scalar product on every tangent space , with , and with the property that every acts as an isometry on :

1. is an isomorphism of vector spaces,

2. , for every .

Notice, for every the matrix is an element of . Moreover, every admits a square root , not necessarily unique, i.e., .

###### Definition 2.5.

Let and take . We define the isomorphism

 dg:TIdnPos1sym(m)→TΣPos1sym(m)

by , for every . Then we define

 ⟨A,C⟩Σ:=tr(Σ−1AΣ−1C),

for every .

###### Lemma 2.6.

Let and take . Then preserves the scalar product. Moreover, is an isometry of and any geodesic in passing through is of the form , where .

From the above lemmas one can isometrically identify with , and as announced in the beginning of this section, we take .

###### Remark 2.7.

It is a fact the topology on given by the distance induced from the Riemannian metric on agrees with the topology of the submanifold of , and thus with the max-norm topology on induced from .

## 3 Grassmannian distributions as quasi-invariant measures on ∂M

As the symmetric space of is a space, it has an associated visual boundary at infinity . is a spherical building of type in the sense of Tits, i.e., it has the structure of a simplicial complex that is expressed as the union of sub-complexes that are called apartments and satisfy three axioms (see [BH99, Chapter II.10]). Each apartment is tessellated with (similar) maximal symplices that are all of the same type (thus shape). Such a maximal simplex is called a chamber of or a spherical chamber at infinity of ; it has dimension and has vertices. If we start coloring the vertices of a spherical chamber with colors, we can color the vertices of the spherical building in such a way that the vertices of each chamber of are differently colored, but using the same set of colors. In fact, each color represents a different type of vertex and each chamber has only one vertex-representative for each color (type). The spherical building is compact with respect to the cone topology induced from . If then the model chamber is just a point.

### 3.1 Parabolic subgroups

The group acts by isometries on and continuously on . Moreover, acts transitively on the set of all chambers of by preserving the type of the vertices of the chambers. Fix for what follows a chamber of . The stabiliser in of a face of the chamber is called a parabolic subgroup of , i.e., for a face of the parabolic subgroup is . Note is a closed subgroup of . Because acts transitively on the set of all chambers of , the parabolic subgroups corresponding to other chambers are conjugated to those of the chamber .

For , the subgroup is called the Borel subgroup of and it is the minimal parabolic, i.e., contained in all the other parabolic subgroups . When is just a vertex of its corresponding parabolic is maximal, i.e., it is not contained in any larger parabolic subgroup. As the simplex has vertices, there are exactly maximal parabolic subgroups. It is well known that, up to conjugacy, the maximal parabolic are of the form

 Pr={(A1B0A2)∈SL(m,R)|A1∈GL(r,R),A2∈GL(m−r,R),B∈Rr×(m−r)},

where . Notice, the blocks are on the diagonal of .

A similar form is known for each parabolic , but here we are only interested in maximal parabolics.

### 3.2 The sample space G(m,r) as a subset of ∂M

It is easy to see the parabolic subgroup is the stabiliser in of the -dimensional vector subspace of that is generated by the first vectors of the canonical base of ; is a point of . We can take the corresponding matrix to be .

As acts transitively on the set of all -dimensional vector subspaces of that contain he origin , one can conclude the Grassmannian equals as a set the quotient , for every . Then using Section 3.1 the Grassmannian equals the set of all vertices of that are of the same type. Using this identification, the action of an element on a vertex of can be interpreted as the matrix multiplication between and the corresponding matrix introduced above.

### 3.3 Levi decompositions of maximal parabolics

It is a well known fact that every maximal parabolic has the following direct product decomposition , called Levi decomposition and where

 Mr:={(A100A2)∈SL(m,R)|A1∈GL(r,R),A2∈GL(m−r,R)}

and

 Ur:={(Id1B0Id2)|Idr∈GL(r,R),Id2∈GL(m−r,R),B∈Rr×(m−r)}.

Moreover, is normal in and is called the unipotent radical of . is called the Levi factor of , it is a reductive group and is not connected.

Further we want to decompose the subgroup . For , by considering and we can define a canonical group homomorphism where

 Tr={(λ1Id100λ2Id2)|λ1,λ2∈R∗,λr1λm−r2=1}

and

 Gr={(A100A2)∈SL(m,R)|det(A1)=±1=det(A2)}.

Notice and are subgroups of , but one can see does not decompose as the direct product . The subgroup is known to be semi-simple. Furthermore, the subgroup is obviously normal in . We consider

 P0r:={(A1B0A2)∈SL(m,R)|det(A1)=±1=det(A2),B∈Rr×(m−r)}

that is a subgroup of . It is a known fact the -orbit of in equals and the -orbits in are the horospheres corresponding to the point at infinity stabilized by .

Similar decompositions of the corresponding Levi factor are known for each parabolic .

### 3.4 Measures on SL(m,R)/Pr

Let be a locally compact group, a closed subgroup of . As we are working in the setting of locally compact groups, all Haar measures that are used in this paper are considered to be left-invariant. We denote by , respectively, the Haar measure on , respectively, .

Endow with the quotient topology, meaning that the canonical projection is continuous and open. In practice, one needs to put a measure on that is explained below.

###### Definition 3.1.

(See Bekka–de la Harpe–Valette [BHV, Appendix B]) A rho-function of is a continuous function satisfying the equality

 ρ(xh)=ΔH(h)ΔG(h)ρ(x) for all x∈G,h∈H, (7)

where are the modular functions on , respectively on .

We have the following relation between rho-functions of and –quasi-invariant regular Borel measures on . For the definition of a –quasi-invariant regular Borel measure see [BHV, Appendix. A.3]. To briefly give the idea, suppose the topological space is endowed with a measure . As acts on the space by the left multiplication, one can ask how the action of on can modify the measure . More precisely, for each what is the relation between and , where is the pushforward measure on induced from the map ? In general, the passage from to is given by a function on , called the Radon–Nikodym derivative between and . If this Radon–Nikodym derivative is just one, then , and is called –invariant. If is –invariant for every , then is called –invariant, otherwise is called –quasi-invariant.

###### Theorem 3.2.

([BHV, Thm. B.1.4]) Let be a locally compact group and be a closed subgroup of . Then there exists a rho-function of .

Moreover, with a rho-function of there is associated a –quasi-invariant regular Borel measure on whose corresponding Radon–Nikodym derivative satisfies the relation , for every and such that

 ∫Gf(x)ρ(x)dx=∫G/H∫Hf(xh)dhdμ(xH) (8)

for every , continuous complex-valued function on with compact support.

Conversely, with a continuous –quasi-invariant regular Borel measure on there is associated a rho-function of , where continuous means the Radon–Nikodym derivative of is a continuous map.

As by Theorem 3.2 rho-functions of always exist, we have that –quasi-invariant regular Borel measures always exist on . We do not intend to clarify here the terminology (e.g., the rho-function associated with a measure) used in Theorem 3.2. Those are explained in [BHV] and the references therein.

###### Remark 3.3.

Given a rho-function of , its corresponding –quasi-invariant regular Borel measure on (given by relation (8)) is obtained by applying the Riesz–Markov–Kakutani representation theorem to a specific positive linear functional on . More precisely, by [BHV, Lem. B.1.2] the linear mapping

 TH:Cc(G)→Cc(G/H),f↦TH(f) given by (TH(f))(xH):=∫Hf(xh)dh

is surjective. Moreover, by [BHV, Lem. B.1.3 (iii)] and because is surjective, the mapping

 TH(f)↦∫Gf(x)ρ(x)dx, for f∈Cc(G)

is a well-defined positive linear functional on . Then apply the Riesz–Markov–Kakutani representation theorem to obtain the regular Borel measure on that is also –quasi-invariant.

For the rest of the article we apply Theorem 3.2 to the case

 G:=SL(m,R) and H:=Pr

a maximal parabolic subgroup of . Recall is unimodular, thus for every . Also, as , where , one example of rho-functions for is the -invariant function such that for every and for every . This function gives the –quasi-invariant regular Borel measure on that is -invariant and whose corresponding Radon–Nikodym derivative satisfies the relation

 dg⋆μdμ(xPr)=ρ(gx)ρ(x) for every g,x∈G. (9)

We claim the family of Grassmannian distributions on of parameter is in fact the family of –quasi-invariant regular Borel measures on of parameter . This is the goal of the next two Sections 3.4.1 and 3.4.2.

#### 3.4.1 The modular function ΔPr of Pr

In order to have an explicit formula for the function defined above, and so an explicit description of the Radon–Nikodym derivative, we need to compute the modular function of .

By [Knpp, Proposition 8.27] the modular function of the real Lie group is given by

where is the adjoint representation of on its Lie algebra . Recall , for every and and

 pr={(A1B0A2)|A1∈ML(r,R),A2∈ML(m−r,R),tr(A1)+tr(A2)=0,B∈Rr×(m−r)}.
###### Lemma 3.4.

For every and every we have .

###### Proof.

This is because is generated by unipotent elements and those have modular function in .

Therefore, in order to compute the modular function of it is enough to compute the modular function on the part of .

###### Lemma 3.5.

Let such that . Then .

###### Proof.

By applying such to and computing one obtains is diagonal as a matrix of and .

#### 3.4.2 Rho-function, Grassmanian distributions and Busemann functions

As mentioned before, is the stabiliser in of the -dimensional vector subspace of that is generated by the first vectors of the canonical base of ; is a point of . We can take the corresponding matrix to be . This Section deals with Grassmanian and quasi-invariant distributions, and one of its main results states the following.

###### Proposition 3.6.

Let and . Then the family of Grassmannian distributions on of parameter is the same as the family of –quasi-invariant regular Borel measures on of parameter . More precisely, equality (2) from the Introduction can be written

 dg⋆μdμ(hPr)=ρ(gh)ρ(h)=dGΣdGIdm(⟨X⟩)=(det(XTX)det(XTΣ−1X))m/2, (10)

for every , where .

###### Proof.

This proposition is a direct consequence of equality (9) above, and Lemma 3.8 and equality (11) below. Indeed, by equality (11) below and because one has . By the same quality (11) below and as we also have

 ρ(gh)=(det(XT0X0)det(XT0(hTgT)−1(gh)−1X0))m/2=(1det(XT0(gT)−1(hT)−1h−1g−1X0))m/2=(1det(XT0(gT)−1X0XT0(hT)−1X0XT0h−1X0XT0g−1X0))m/2=(1det(XT0(gT)−1X0)⋅det(XT0(hT)−1X0)⋅det(XT0h−1X0)⋅det(XT0g−1X0))m/2=(1det(XT0(hT)−1X0