DeepAI
Log In Sign Up

Riemannian statistics meets random matrix theory: towards learning from high-dimensional covariance matrices

03/01/2022
by   Salem Said, et al.
0

Riemannian Gaussian distributions were initially introduced as basic building blocks for learning models which aim to capture the intrinsic structure of statistical populations of positive-definite matrices (here called covariance matrices). While the potential applications of such models have attracted significant attention, a major obstacle still stands in the way of these applications: there seems to exist no practical method of computing the normalising factors associated with Riemannian Gaussian distributions on spaces of high-dimensional covariance matrices. The present paper shows that this missing method comes from an unexpected new connection with random matrix theory. Its main contribution is to prove that Riemannian Gaussian distributions of real, complex, or quaternion covariance matrices are equivalent to orthogonal, unitary, or symplectic log-normal matrix ensembles. This equivalence yields a highly efficient approximation of the normalising factors, in terms of a rather simple analytic expression. The error due to this approximation decreases like the inverse square of dimension. Numerical experiments are conducted which demonstrate how this new approximation can unlock the difficulties which have impeded applications to real-world datasets of high-dimensional covariance matrices. The paper then turns to Riemannian Gaussian distributions of block-Toeplitz covariance matrices. These are equivalent to yet another kind of random matrix ensembles, here called "acosh-normal" ensembles. Orthogonal and unitary "acosh-normal" ensembles correspond to the cases of block-Toeplitz with Toeplitz blocks, and block-Toeplitz (with general blocks) covariance matrices, respectively.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/27/2020

Riemannian Gaussian distributions, random matrix ensembles and diffusion kernels

We show that the Riemannian Gaussian distributions on symmetric spaces, ...
02/15/2021

Gaussian distributions on Riemannian symmetric spaces in the large N limit

We consider Gaussian distributions on certain Riemannian symmetric space...
12/04/2020

A Canonical Representation of Block Matrices with Applications to Covariance and Correlation Matrices

We obtain a canonical representation for block matrices. The representat...
11/29/2017

Intrinsic Analysis of the Sample Fréchet Mean and Sample Mean of Complex Wishart Matrices

We consider two types of averaging of complex covariance matrices, a sam...
02/07/2018

Group kernels for Gaussian process metamodels with categorical inputs

Gaussian processes (GP) are widely used as a metamodel for emulating tim...
02/11/2019

Manifold Optimisation Assisted Gaussian Variational Approximation

Variational approximation methods are a way to approximate the posterior...
10/01/2013

Graph connection Laplacian and random matrices with random blocks

Graph connection Laplacian (GCL) is a modern data analysis technique tha...

I Introduction

The present paper aims to bring together the tools of Riemannian statistics and those of random matrix theory, in order to unlock the computational issues which have stood in the way of applying Riemannian learning models to datasets of high-dimensional covariance matrices.

Over the past few years, Riemannian Gaussian distributions, and mixtures of these distributions, were introduced as a means of modeling the intrinsic structure of statistical populations of covariance matrices [Cheng2013, Sa16, Sa17]

. They have found successful applications in fields such as brain-computer interface analysis and artificial intelligence 

[zanini2018][mathieu2019]. However, such applications could not be pursued for covariance matrices of relatively larger dimension (for example, ). Indeed, there seemed to exist no practical method of computing the normalising factors associated with Riemannian Gaussian distributions on spaces of high-dimensional covariance matrices.

The theoretical contribution of this paper is to show that this missing method arises quite naturally, as soon as one realises that a strong connection exists between Riemannian Gaussian distributions and random matrix theory. Roughly speaking, Riemannian Gaussian distributions on spaces of real, complex, or quaternion covariance matrices correspond to log-normal orthogonal, unitary or symplectic matrix ensembles. These are similar to the classical, widely-known Gaussian orthogonal, unitary and symplectic ensembles, but with the normal weight function replaced with a log-normal weight function. Thanks to this new connection, the powerful tools of random matrix theory can be employed to uncover several original properties of Riemannian Gaussian distributions, especially for higher-dimensional covariance matrices.

The present paper also extends the equivalence between Riemannian Gaussian distributions and random matrix ensembles to the case of block-Toeplitz matrices. Instead of a log-normal weight function , the corresponding weigh function is “-normal" , where is the inverse hyperbolic cosine. No attempt is made to develop these additional “-normal" ensembles, at least not for now. In fact, there are already plenty of results to derive and apply with the log-normal ensembles, which cover the important cases of real and complex covariance matrices. This is in part because log-normal ensembles (also called Sitltjes-Wigert ensembles) are amenable to a direct analytic treatment, as will be seen below.

It is also because of a rather surprising connection also noted in [SH21]. The log-normal unitary ensemble had appeared in the theoretical physics literature about twenty years ago, as a random matrix model for the Chern-Simons quantum field theory [Ma05]. A seen in the present paper, this ensemble corresponds to Riemannian Gaussian distributions on the space of complex covariance matrices. Of course, such Riemannian Gaussian distributions have nothing to do with quantum field theory, but some of their valuable properties can be obtained by carefully re-adapting already existing results, found within this theory.

The present paper was developed independently from a theoretical physics paper, published only very recently [Ti20]. Both papers focus on the connection between Riemannian Gaussian distributions and log-normal matrix ensembles, but significantly differ in terms of their contribution and focus, as will be discussed below. The reader is also referred to [Forrester1989] for previous results on asymptotic computations of log-partition functions from the random matrix theory literature, as well as the recent paper [Forrester2021], which considers global and local scaling limits for the Stieltjes-Wigert random matrix ensemble and associated physical interpretations.

The paper is organised as follows. Section II recalls basic background material on Riemannian Gaussian distributions. Section III presents the main original results, obtained by applying random matrix theory to the study of Riemannian Gaussian distributions. Section IV illustrates the importance of these original results to learning from datesets of high-dimensional covariance matrices. Finally, Section V considers Riemannian Gaussian distributions on the space of block-Toeplitz covariance matrices, and introduces the corresponding “-normal" ensembles. Proofs of the propositions stated in Section III and V are provided in Appendix A.

Ii Notation and background

Let denote the space of covariance matrices which are either real (), complex () or quaternion (). Precisely, the elements of are real, symmetric positive-definite matrices, while those of and are respectively complex and quaternion, hermitian positive definite matrices.

Of course,

is an open convex cone, sitting inside the real vector space

of real, complex or quaternion self-adjoint matrices (the adjoint of a matrix being its conjugate transpose). Therefore, is a real differentiable manifold, whose tangent space at any point is isomorphic to . In particular, the dimension of is where .

The elements of are in one-to-one correspondence with the centred (zero-mean)

-variate normal distributions (real, complex circular, or quaternion circular, according to the value of

). Thus, can be equipped with the Rao-Fisher information metric of the centred -variate normal model [amaribook]. This is identical to the so-called affine-invariant metric introduced in [Pennec2006],

(1)

where denotes the real part and denotes the trace. The Riemannian geometry of the metric (1) is quite well-known in the geometric information science community (see the recent book [Pennec2019]). Recall here the associated geodesic distance

(2)

where matrix logarithms and powers are understood as self-adjoint matrix functions, obtained by taking logarithms and powers of eigenvalues. The main advantage of the metric (

1) is its invariance under the action of the group of invertible matrices with real, complex or quaternion entries (according to the value of ). For example, if and is replaced by while is replaced by , then the distance remains unchanged (note that denotes the adjoint, or conjugate transpose).

In terms of the distance (2), we define Riemannian Gaussian distributions on the space

to be given by parameterised probability density function

, as in [Sa16][Sa17],

(3)

where is the centre-of-mass parameter, and the dispersion parameter. The normalising factor is given by the integral

(4)

with respect to the Riemannian volume . Here,

where if is complex and if is quaternion (here, denote complex or quaternion imaginary units).

Both formulae (3) and (4) are greatly simplified by introducing “polar coordinates". Each can be diagonalised as where is a diagonal matrix, with diagonal elements for , and , which means (the identity matrix). Note that is the orthogonal group , is the unitary group , and is the symplectic (quaternion unitary) group (for deeper insight, see the recent review [edelman]).

Now, as in [Sa17], let follow the Riemannian Gaussian density (3), and . If , then

is uniformly distributed on

(for a rigorous definition of this “uniform distribution", see [meckes]), while

have joint probability density function

(5)

where indicates proportionality and the hyperbolic sine. In addition, the normalising factor in (4) reduces to a certain multiple integral . Specifically,

(6)

where is a numerical constant, which appears after the uniformly distributed matrix is integrated out of (4), and where

(7)

One of the main issues addressed in the present paper is the efficient approximation of the multiple integral . Until now (in [Sa16][Sa17]), this was done using a Monte Carlo technique which involved a smoothing method (containing certain arbitrarily fixed parameters) and which failed to produce coherent results when the dimension increased beyond .

Before proceeding to present our main results, we should briefly note the existence of several alternative proposals for the extension of Gaussian distributions to Riemannian manifolds, including constructions based on heat flows and diffusion processes [Sommer2015]. Further discussion of these alternative formulations is beyond the scope of this paper and the interested reader is referred to the literature on the topic for further information. See [Pennec2019] (Section 3.4.3 and Chapter 10) and the references therein for a recent and comprehensive account.

Iii Main results

The main results of the present paper stem from the equivalence between Riemannian Gaussian distributions on the spaces of real, complex, or quaternion covariance matrices and log-normal orthogonal, unitary, and symplectic matrix ensembles.

Let be a random matrix in which follows the Riemannian Gaussian density of (3). It is always possible to assume that , since this holds after replacing by

. Then, the probability distribution of

is described by the following proposition.

Proposition 1.

Let follow the Riemannian Gaussian density (3) with . If , then the probability distribution of is given by

(8)

for any measurable subset of . Here, and the notation was introduced after (4).

In plain words, this proposition states that follows a log-normal matrix ensemble. If is diagonalised as , then is uniformly distributed on , and the eigenvalues , which are all positive, have joint probability density function

(9)

where is the Vandermonde determinant, and is the log-normal weight function.

In essence, Proposition 1 is already contained in [Ti20]. As a consequence of this proposition, the multiple integral of (7) may be expressed as follows

(10)

where (product over ) with . In [Ti20], integrals as the one in (10) are expressed using the Andreev or De Bruijn identities, often employed in random matrix theory. These yield somewhat cumbersome formulae, involving determinants of size . On the other hand, the present paper focuses on an alternative approach, which turns out to be more suitable from a practical point of view. Instead of expressing exactly, by means of complicated formulae, the aim is to use a highly efficient approximation, which involves a single analytic expression. This is indicated by the following proposition.

Proposition 2.

In the limit where and , while the product remains constant,

(11)

where is the trilogarithm function, .

In [Ti20], the limit in (11) is only mentioned in passing, and not stated under the same form. Here, this limit will be given centre stage. Proposition 2 states that (11) is valid in the “double-scaling regime" ( and ), but numerical experiments have shown that can be replaced by the expression afforded by (11) without notable loss of accuracy, whenever is small in comparison with (this is further illustrated below).

At least informally, this can be justified by appealing to arguments originating in theoretical physics [Ma05]. Considered as a function of , is called the Free energy (log of partition function). This free energy can be expanded in an asymptotic series (see Section 1.3 of [Ma05]),

(12)

which is obtained by summing over Feynman diagrams, or so-called fatgraphs. Each coefficient is itself a series , where counts fatgraphs which are said to have holes and genus (this means that one thinks of a fatgraph as a graph with loops, drawn on a surface of genus , such as a sphere or torus, etc). Now, accepting (12), it follows that for each fixed value of ,

(13)

so that approximates the left-hand side up to an error of the order of . Finally, recalling that , it is clear that is the right-hand side of (11) — this follows by uniqueness of asymptotic expansions.

In addition to the asymptotic form of , another quantity of interest is the asymptotic empirical distribution of eigenvalues, of a random matrix which follows the Riemannian Gaussian density with . Let denote the eigenvalues of , and consider their empirical distribution

(14)

for any open interval , where denotes expectation, and the number of which belong to . In [Ti20], the probability density function of was expressed as a weighted sum of Gaussian distributions, by a direct application of the Christoffel-Darboux formula, as in [deift]. It is possible to do so for any value of , only because the case can be studied using a well-known family of orthogonal polynomials, called Stieltjes-Wigert polynomials. No such analytic tool is available when or .

To make up for this issue, the following proposition provides an asymptotic expression of the distribution , valid for all values .

Proposition 3.

In the limit where and , while the product remains constant, the empirical distribution converges weakly to a distribution with probability density function , where

(15)

on the interval , where and , with .

One hopes that, similar to the situation discussed after Proposition 2, the asymptotic density (15) approximates the finite- empirical distribution to such a good accuracy that one can replace by this asymptotic density, for many practical purposes. This possibility will not be further investigated in the present paper.

Iv Towards learning applications

Riemannian Gaussian distributions were initially proposed as basic building blocks for learning models which aim to capture the intrinsic structure of statistical populations of covariance matrices. These include the mixture models and hidden Markov models, introduced in 

[Sa16][mtnshmm] and further developed in [QT21]. In order to make use of these models in real-world applications, it is indispensable to know how to effectively compute the logarithm of the multiple integral of (7), for a dimension which may be in the tens or hundreds.

Knowledge of is already crucial in the simplest situation, where one tries to fit a single Riemannian Gaussian density (rather than a whole mixture) to data . Indeed, this will require setting and , where and

are the maximum-likelihood estimates, given in 

[Sa16][Sa17]. Specifically,

(16)

is the Fréchet mean of the data , with respect to the distance (2), and is the solution of the nonlinear equation

(17)

Therefore, it is already impossible to solve a toy problem, with a single Riemannian Gaussian density, without having some kind of hold on .

Until now (in [Sa16][Sa17]), was approximated using an ad hoc Monte Carlo technique, which failed to produce coherent results for a dimension just above . In the present section, the aim will be to show that significantly improved results can be obtained by solving Equation (17) after approximating with the expression afforded by (11), according to Proposition 2.

First, numerical experiments were conducted to verify the validity of this new approximation. According to (13), the right-hand side of (11) should approximate up to an error of the order of . To see that this is correct, the right-hand side of (11) was compared to certain exact expressions of . Namely, for the case, one has the following expression, obtained in [habilitation],

(18)

and for the case, when the dimension is even, the following expression, based on [Ti20],

(19)

where denotes the Pfaffian, equal to the square root of the determinant, and the matrix has entries

(20)

with the error function.

Numerical evaluation of (18) for large values of or (up to ) is quite straightforward. Moreover, it immediately shows that (11) and (18) agree very closely when is smaller than , and then gradually diverge away from one another as increases. Figure 1 provides graphical illustration for and . Still larger values of yield an even stronger match between (11) and (18).

(a) and  : plot of
(b) and  : plot of
Fig. 1: Proposition 2 () : (11) in dashed red and (18) in solid blue

It is not equally straightforward to numerically evaluate (19). Even at , we begin encountering overflow problems for moderate values of when performing the computations in MATLAB. Still, as long as these overflow problems do not appear, it is possible to observe a close agreement between (11) and (19), as in the case. This is shown in Figure 2 for and .

(a) and  : plot of
(b) and  : plot of
Fig. 2: Proposition 2 () : (11) in dashed red and (19) in solid blue

Based on the numerical results summarised in Figures 1 and 2, it seems possible to use the right-hand side of (11), multiplied by , as a substitute for . While this is only an approximation, it is a highly efficient one, and has the advantage of being given by a rather simple analytic expression. In the case, direct numerical evaluation of is unstable for larger values of , and (11) offers a practical way out of this problem.

This can be verified by using symbolic computation (Mathematica), which allows us to extend the evaluation of (19) to larger values of for larger . It is nonetheless associated with the drawback that the resulting curves tend to be artificially non-smooth as in Figure 2(a), due to numerical artifacts. These non-smooth features also exist in the curve depicted in Figure 2(b), but do not appear visible at the given resolution. This behaviour is particularly problematic in the context of Equation (17), due to the presence of the derivative of . Furthermore, symbolic computation of (19) becomes exceedingly slow for sufficiently large , and this can only be overcome by relying on approximations such as (11).

(a) and  : plot of up to
(b) and  : plot of up to
Fig. 3: Proposition 2 () : (11) in dashed red and (19) in solid blue for

The second set of experiments directly addressed Equation (17), in the case, for a dimension ranging between and . This equation was solved using the Newton method, after its right-hand side was approximated according to (11). Here, the solution obtained in this way will be denoted . This is an approximation of the maximum-likelihood estimate (the exact solution of (17)). If this approximation is any good, should approach the true value of for sufficiently large (the number of data points ). This can already be observed at when , as reported in the following Table I

. Each entry in this table gives the average value and standard deviation of

, calculated over independent trials (average standard deviation). It is clear that increasing from to only reduces the standard deviation of , without really affecting its average.

true 1 2 3 4 5 6 7
TABLE I: The solution of (17), for and (r.h.s. approximated using (11))

When , the right-hand side of (17) can still be approximated using the Monte Carlo technique mentioned in [Sa16][Sa17]. The solution obtained with this approximation will be denoted . The following Table II shows that the estimation error from is quite improved, in comparison with the estimation error from . Moreover, seems to systematically overestimate the true value of .

true 1 2 3 4 5 6 7
TABLE II: The solution of (17), for and (MC approximation of r.h.s.)

When is above , Monte Carlo approximation of the right-hand side of (17) is not feasible anymore, and one is left only with the approximation using (11). The solution obtained from this approximation is shown in Table III, for and . Here (contrary to Table II), systematically underestimates the true value of . In fact, identical behavior was observed for between and , along with similar values of .

A practical means of overcoming this issue would be to include a penalty term into Equation (17), in order to enforce greater values of its solution. Then, the approximation using (11) can be successfully implemented for larger dimension , where Monte Carlo approximation becomes useless (the present discussion stopped at , because the sampling algorithms used to generate the data points could not be taken any further).

true 1 2 3 4 5 6 7
TABLE III: The solution of (17), for and (r.h.s. approximated using (11))

In conclusion, the present section has demonstrated the new approximation of , based on (11), significantly improves the existing Monte Carlo approximation, and also extends it to higher dimension. In upcoming work, the experiments conducted in this section will be generalised to more realistic learning models which may be applied to real-world data (for example, the mixture models considered in [zanini2018]).

V Block-Toeplitz covariance matrices

Similar to the spaces introduced in Section II, consider the spaces , defined as follows.

  • is the space of matrices with complex entries, whose operator norm is .

  • is the space of symmetric matrices with complex entries, whose operator norm is .

Here, operator norm means the largest singular value. The space

will be called the Hermitian Siegel domain, and the symmetric Siegel domain. The focus will be restricted to these or cases, since they are closely related to time-series analysis and signal processing (the case is similar, but involves quaternion matrices).

Specifically, assume a wide-sense stationary -variate time series of length is described by its autocovariance matrix . Wide-sense stationarity implies has a block-Toeplitz structure, with blocks of size . When solving an optimal prediction problem, one may apply a multidimensional Szegö-Levinson algorithm to the autocovariance  [jeuris1][yannthesis], and obtain a family of matrices , where is the zero-lag autocovariance of the original time series (this is a complex covariance matrix, ), and each is a so-called matrix reflection coefficient, which belongs to . If the autocovariance has Toeplitz blocks, then each moreover belongs to (which is a subspace of ).

Riemannian geometry enters this picture in the following way [jeuris1][yannthesis]. Consider an information metric on the space of block-Toeplitz autocovariance matrices , equal to the Hessian of the entropy function . In terms of the new coordinates , this information metric is a direct product of the affine-invariant metric (1) on the first coordinate , and of a scaled copy of the Siegel domain metric on each of the remaining coordinates (see formula (3.11) in [jeuris1]). Dropping the subscript , this Siegel domain metric is given by,

(21)

where the tangent space is isomorphic to the space of complex matrices if , and to the space of symmetric complex matrices if (this isomorphism shows that the dimension of equals ). The geodesic distance associated with the Siegel metric (21) has the following expression [jeuris1][yannthesis],

(22)

where is called the matrix cross-ratio.

Riemannian Gaussian distributions on the space are given by their probability density function which is of the same form as in (3), but with the distance determined by (22), and the normalising factor

(23)

with respect to the Riemannian volume . Here, where and denote the real and imaginary parts (the product is over if and over all if ).

As shown in [Sa17], a Riemannian Gaussian distribution on the space of block-Toeplitz covariance matrices is just a product of independent Riemannian Gaussian distributions, one for each coordinate . For , this is a Riemannian Gaussian distribution on , already considered in Section II. Thus, to understand Riemannian Gaussian distributions of block-Toeplitz covariance matrices, it only remains to study Riemannian Gaussian distributions on the Siegel domain .

In Section III, it was seen that Riemannian Gaussian distributions on are equivalent to log-normal matrix ensembles. On the other hand, Riemannian Gaussian distributions on are equivalent to “acosh-normal" ensembles, which will be described in Proposition 4 below.

Note first that each matrix can be factorised in the following way

(24)

where and are unitary, denotes the transpose, and is diagonal, with diagonal elements for . The

case follows from the singular value decomposition of

, and the case follows from the Takagi decomposition of  [edelman]. In either case, is of the form because the singular values of are all .

Proposition 4.

Let follow a Riemannian Gaussian distribution on the Siegel domain with (zero matrix). If is factorised as in (24), then and are uniformly distributed on the unitary group . Moreover, if then have joint probability density function

(25)

where , and where is the “acosh-normal" weight function.

Note that the assumption that the centre-of-mass parameter is equal to does not entail any loss of generality. The transformation [yannthesis]

(26)

maps to , while preserving the Siegel domain metric (21) and the associated distance and Riemannian volume. Thus, if follows a Riemannian Gaussian density , it is enough to replace by , which will have a Riemannian Gaussian density with and with the same .

Proposition 4 implies that the transformed singular values follow the classical eigenvalue distribution of an orthogonal () or unitary () matrix ensemble with “acosh-normal" weight function. Therefore, in particular, the normalising factor of (23) reduces to a multiple integral (compare to (6) and (10))

(27)

where , with for the weight function .

Thus, Riemannian Gaussian distributions on the Siegel domain can be studied by applying the tools of random matrix theory to new “acosh-normal" matrix ensembles. Hopefully, in the near future, this will lead to similar results to the ones obtained with log-normal ensembles in Section III, paving the way to learning from datesets of high-dimensional block-Toeplitz covariance matrices.

Appendix A Proofs of Propositions 1 to 4

A-a Proof of Propositon 1

If follows the Riemannian Gaussian density (3) with , then

(28)

Let denote the eigenvalues of . Using (2) and the fact that , (28) becomes

(29)

Recall that . Accordingly, if are the eigenvalues of , an elementary calculation yields

Therefore, (29) can be written

(30)

To conclude, it is enough to use once more the definition . This implies

Thus, using the fact that

and changing the variable of integration from to in (30), it follows that

as required in (8).

A-B Proof of Proposition 2

The case follows from (18), by an elementary calculation, after noting that

is a Riemann sum for the improper integral

For other values of , the result will be obtained from the reasoning presented in the proof of Proposition 3, based on the scaling equation (33).

A-C Proof of Proposition 3

The case has already been proved in [habilitation]. To deal with the general case, write (10) under the form , where

and is the multiple integral

Now, if is the empirical distribution of (where denotes the Dirac measure at ), then

(31)

where is the so-called energy functional

(32)

defined for any probability distribution on . This energy functional satisfies the following scaling equation

(33)

which is easily obtained after dividing (32) by . The proof can now be completed by applying to (31) and (33) the arguments in [deift]. First [deift] (Corollary 6.90, Page 155),

where is the so-called equilibrium distribution, the unique minimiser of the energy (32) among probability distributions on . From the scaling equation (33), it is now clear

(34)

Therefore, by adding the limit of ,

(35)

However, when , it is already known that this limit is , where is defined in (11). This provides , which can be replaced back into (35), yielding the general case of Proposition 2.

To prove Proposition 3, it is possible to use the scaling equation (33), once more. From [deift] (Section 6.4), if is defined as in (14), but with the instead of the , then converges weakly to the equilibrium distribution . On the other hand, the scaling equation (33) implies , and is already known to be the image of the distribution with density , defined as in (15), under the change of variables . This provides for any value of , and the Proposition then follows by changing the variables back from to .

A-D Proof of Proposition 4

The proof relies on the general theory of Gaussian distributions on Riemannian symmetric spaces, as outlined in [Sa17][habilitation]. Here, to keep the proof self-contained, it will be helpful to briefly recall certain aspects of this theory (for a more detailed, in-depth discussion, the reader is referred to [habilitation], Sections 1.9 and 3.3).

A Gaussian distribution on a Riemannian symmetric space is defined by its probability density function

(36)

with respect to the Riemannian volume on , with and .

It is always assumed is a Riemannian symmetric space of non-positive curvature, associated to a symmetric pair . Precisely, is a connected Lie group which acts transitively and isometrically on (this action is denoted , for and ), and is a compact subgroup of , made up of those elements which fix a certain point (that is, if and only if ).

For a concrete understanding of Gaussian distributions on , it is necessary to take a closer look at the Lie algebras of and , denoted and , respectively. These are related together by the so-called Cartan decomposition, (direct sum), where the subspace of can be identified with the tangent space (tangent space to at at ). In terms of this decomposition, the main construction needed for the proof can be described as follows.

Without any loss of generality, and are taken to be matrix Lie groups, and and their matrix Lie algebras. Let be a maximal abelian subspace of (that is, all the matrices commute with each other). Then, each matrix can be written under the form , where and (this factorisation is the general template for the fifty three matrix factorisations, outlined in [edelman]). Moreover, any admits a representation

(37)

where denotes the matrix exponential. Incidentally, this representation is not unique, but for almost all (for all , except a subset of zero volume) has exactly couples which satisfy (37), where is the number of elements of the Weyl group of the symmetric couple .

It is now possible to state the following Lemma 1 [Sa17][habilitation], which will yield the entire proof, by direct application.

Lemma 1.

Let follow the Gaussian density (36) on , with . If is represented as in (37), then is uniformly distributed on the compact group . Moreover, has the following probability density function on (recall that is a real vector space, so the density is with respect to the usual Lebesgue measure on )

(38)

where is the Riemannian norm of (since , it can be identified with a vector in ), and is a set of positive roots on , with respective multiplicities (each is a certain linear function ).

Recall that positive roots are any set of linear functions on such that, for any , the eigenvalues of the linear operator , given by (this is ), are equal to with respective multiplicities .

The proof may now begin in earnest. It merely consists in identifying , , , , , and , for each Siegel domain , and then writing down the corresponding version of Lemma 1, which directly yields Proposition 4. Fortunately, all the necessary information can be found in [piatetski][helgason].
The symmetric pair : is associated to the symmetric pair , where and are groups of complex matrices , defined in the following way (here, denotes the complex conjugate).

where and denote the following matrices

These groups and act on by matrix fractional transformations