Riemannian Gaussian distributions, random matrix ensembles and diffusion kernels

11/27/2020 ∙ by Leonardo Santilli, et al. ∙ University of Lisbon 0

We show that the Riemannian Gaussian distributions on symmetric spaces, introduced in recent years, are of standard random matrix type. We exploit this to compute analytically marginals of the probability density functions. This can be done fully, using Stieltjes-Wigert orthogonal polynomials, for the case of the space of Hermitian matrices, where the distributions have already appeared in the physics literature. For the case when the symmetric space is the space of m × m symmetric positive definite matrices, we show how to efficiently compute by evaluating Pfaffians at specific values of m. Equivalently, we can obtain the same result by constructing specific skew orthogonal polynomials with regards to the log-normal weight function (skew Stieltjes-Wigert polynomials). Other symmetric spaces are studied and the same type of result is obtained for the quaternionic case. Moreover, we show how the probability density functions are a particular case of diffusion reproducing kernels of the Karlin-McGregor type, describing non-intersecting Brownian motions, which are also diffusion processes in the Weyl chamber of Lie groups.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The statistical analysis of a probability measure on a differentiable manifold is a topic attracting much attention and has developed considerably due to the wide range of interesting applications that require the analysis of data not living in ordinary Euclidean spaces. There is great interest, for example, in trying to process datasets which lie in the space of symmetric positive definite matrices.

This space can be equipped with a Riemannian metric, giving it the structure of a Riemannian homogeneous space of negative curvature. Such Riemannian manifold has been thoroughly studied from many points of view, corresponding to different disciplines, such as harmonic analysis and number theory [1, 2], matrix analysis [3] and information geometry and multivariate statistics [4, 5].

This and other symmetric spaces have been studied endowed with metrics such as the Rao-Fisher metric or the log-Euclidean metric [6], in order to develop the tools to properly carry out statistical inference of data on such spaces. The landscape of applications is very vast as it includes, for example, medical imaging [7, 8], continuum mechanics [9], radar signal processing [10, 11]

and computer vision 

[12, 13, 14, 15, 16], among many other references that could be quoted here.

In the present paper, the Riemannian metric will be the Rao-Fisher metric, also known as affine-invariant metric, and is the subject of Section I-A. A specific probabilistic model to represent the statistical variability of data in non-Euclidean spaces, such as

, has been developed by introducing the so-called Gaussian probability distributions on Riemannian symmetric spaces

[17, 18, 19, 20].

The Riemannian Gaussian distribution will depend on two parameters, and . The form of its probability density function, generalising that of a Gaussian distribution on the Euclidean space , is given by

(1)

where is Rao’s Riemannian distance, defined in (5) below, and the density is with respect to the Riemannian volume element of , denoted by . In comparison to a Gaussian distribution on , the normalising factor plays the role of the factor .

Among other possible applications, the distributions provide a statistical foundation for the concept of Riemannian centre of mass, which plays a major role in many contexts [21, 22]

. They also play a role in machine learning when the dataset is inherently structured

[23, 24, 25].

In addition to the relationship between Riemannian Gaussian distributions and the concept of Riemannian centre of mass and the potential for applications, discussed in [17, 18, 19, 20], we shall show here that one of the hallmarks of these distributions is analytical tractability. In particular, we will show how the obtained distributions are of standard random matrix type and how the ensuing analytical tools associated to the study random matrices can be used effectively to further characterize the distributions analytically [26, 27, 28].

More specifically, we shall use Pfaffian and determinant descriptions of the random matrix ensembles and the classical orthogonal polynomial method in random matrix theory [26, 27, 28]. We do this in the setting which includes skew-orthogonal polynomials, necessary to solve models which do not corresponds to the case of Hermitian matrices, such as the case of the real symmetric matrices, which is central for statistical applications.

We now briefly summarize the derivation of the model for the case of real symmetric matrices, along the lines of [17, 18, 19, 20], to set up the context. Further details are in these references, while we will focus, from Section II on, on the analytical characterization of the joint probability distribution functions using random matrix theory.

I-a The Rao-Fisher metric: distance, geodesics and volume

A Riemannian metric on is a quadratic form which measures the squared length of a small displacement , separating two elements and . The Rao-Fisher metric is [5, 2]

(2)

where denotes the matrix trace. If there is no ambiguity, this can also be written .

The Rao-Fisher metric defines a Riemannian distance . This is Rao’s distance, defined as follows [1, 2]: Let and be a differentiable curve with and . The length of is defined by

(3)

where . Rao’s distance is the infimum of taken over all differentiable curves as above.

When equipped with the Rao-Fisher metric, the space is a Riemannian manifold of negative sectional curvature [1, 2]. One implication of this property, since is also complete and simply connected, is that the infimum of is realised by a unique curve , known as the geodesic connecting and . The equation of this curve is

(4)

Given expression (4), it is possible to compute from (3). This is precisely Rao’s distance . It turns out [2],

(5)

All matrix functions appearing in (4) and (5), (square root, elevation to the power , and logarithm), should be understood as symmetric matrix functions. For this, see [29].

The Rao-Fisher metric measures length, and therefore provides a definition of volume. The Riemannian volume element associated to the Rao-Fisher metric is [2]

(6)

where indices denote matrix entries.

One can then consider the expressions of and , given by (2) and (6), in terms of polar coordinates . Then, for any function ,

(7)

where is the exterior product [19]

(8)

For a class function, it will be integrated out (see [30, Page 71]),

(9)

with the multivariate Gamma function, given in [30]. Formula (7) follows directly from [2, Proposition 3, Page 43] but has been written following the notation in [19].

Thus, for and , let be given by,

(10)

The normalization constant is then defined to be the integral

(11)

where is the Riemannian volume element (6). For any and ,

(12)

where is the identity matrix. Let . Then,

(13)

with defined in (9).

We show how to use random matrix tools, in particular the evaluation of Pfaffians, to compute analytically for several values of and also in the large limit. This extends some of the results in [17, 18] and the other works discussing Gaussian distributions on Riemannian symmetric spaces, where the case is evaluated analytically and the rest is carried out numerically, with Monte-Carlo methods. Before that, we note that the corresponding model for Hermitian matrices, also discussed in [19], which is characterized by an analogous expression but with the term squared, is fully solvable with orthogonal polynomials. It has already been studied in the physics literature in detail, but we summarize and develop the results in the next Section and we will study more difficult cases to treat analytically, such as (13), in Section III and in the Appendix. Finally, Section IV will contain a discussion of these statistical distributions from the point of view of diffusion processes.

Ii Full solution of the case of Hermitian matrices

Let us define the function as

(14)

that we call the partition function, adopting the language of random matrix theory. While the definition (14) makes sense for every , with a suitable choice of , we will be mainly interested in , corresponding to random matrix ensemble with orthogonal, unitary and symplectic symmetry, respectively. To avoid cluttering overall factors of in what follows, we adopt the normalization of [31] and choose

(15)

but we stress that other suitable choices exist, as for example

, and the present discussion does not depend on such choice. Note that the variance with this normalization is

, so in particular it is for but it is for . For in (14) we find

(16)

The crucial fact we want to exploit is that can be written in fully standard random matrix form, meaning in terms of a standard Vandermonde determinant. This was done in [32], in the context of the study of vicious walkers111We detail further the connection with diffusion in the last Section and, later on, it was also crucial to study certain gauge theories of topological type [33]. One can therefore, use all the power of the traditional random matrix tools, such as determinantal expressions and the method of orthogonal polynomials. Denoting and using

(17)

we see that

(18)

Changing variables to complete the square, it follows that:

(19)

which describes the partition function of the Stieltjes-Wigert random matrix model. Note that this model is of the standard random matrix form but with a weight function of log-normal type: . Random matrix theory gives formulas222For the case . The other cases are more complicated as they require skew-orthogonal polynomials.

for any marginal of the joint probability density function of eigenvalues, including the normalization constant, in terms of the polynomials orthogonal with regards to the log-normal weight

above. These polynomials are the Stieltjes-Wigert polynomials [34], which are essentially a type of -deformed Hermite polynomial.

A Riemannian log-normal probability distribution was already introduced in [35] (see [36] for a review) but is not of the random matrix type and not comparable to (19) due to an extra factor in the Vandermonde part.

Another way to see this -deformed structure at play is to define the -parameter

(20)

and introduce the -number

(21)

which reduces to an ordinary real number, , in the limit . Then we have

(22)

where . This latter form shows that the partition function is, up to the explicitly known overall -dependent factor, the -deformation of the Gaussian -ensemble, for every , with as in (20).

We study the partition function . We discuss first the case in the rest of the present Section. The other two cases and are studied in Section III.

Ii-a Moments of the log-normal distribution and the Stieltjes-Wigert orthogonal polynomials

In what follows we exploit the rewriting (19), which allows to directly apply standard results from random matrix theory. Define the inner product

(23)

for analytic real functions and . We keep the dependence on implicit in the notation. The inner product of two monomials is333

The same moments can be obtained working on the unit circle, if the function is instead Jacobi’s third theta function, upon a formal replacement

. This leads to a unitary matrix model with analogous properties to the models here discussed, see

[37, 38, 39, 40] for details.

(24)

For the cases and we will need to define skew-symmetric products, but for , can be evaluated exactly [32], thanks to the Stieltjes-Wigert polynomials, which are orthonormal polynomials with respect to inner product defined in (23), that is

(25)

This family of polynomials is given by

(26)

where the -parameter is

(27)

(not to be confused with in (20)), and

(28)

is the -binomial. Let us denote by the leading coefficient, that is , and define the monic orthogonal polynomials , which satisfy the orthogonality relation

(29)

with and

(30)

for [32].

The main tool to evaluate exactly is the celebrated Andréief identity [41, 42], which allows to rewrite the matrix model (19) as a determinant:

(31)

In turn, the inner product of monomials in the determinant can be replaced by , so that the matrix of which we take the determinant becomes diagonal in this basis, and we obtain [32]

(32)

with defined in (27) and the -Gamma function

(33)

We plot as a function of for various in Figure 1, and as a function of for various fixed in Figure 2.

Fig. 1: Plot of as a function of , for .
Fig. 2: Plot of as a function of , for fixed values of .

The dependence on is better captured by rather than itself. For this reason, while our discussion will focus on , in the plots and numerical evaluations we will instead consider .

Ii-B Eigenvalue density,

Let us introduce the eigenvalue density, which by definition is obtained integrating the joint probability density over of the eigenvalues [26]:

(34)

Clearly, from the invariance of the integral under permutation of the ’s, we can integrate over any of the variables and obtain the same definition of . Throughout this Subsection we consider . It is convenient to use the change of variables as in (19), and exploit the Stieltjes-Wigert polynomials [33]. We can use the Christoffel-Darboux formula to rewrite the eigenvalue density of any random matrix ensemble as [27, Eq. (5.13)]

(35)

where is the weight function, are the monic orthogonal polynomials with respect to the weight , and the prime means derivative with respect to . In our case, and are the monic Stieltjes-Wigert polynomials introduced in the previous Subsection. We find

(36)

Undoing the change of variables, using and multiplying the last expression by the Gaussian prefactor coming from the weight function in the Christoffel-Darboux formula (35), we find that takes the form

(37)

with the coefficients obtainable from the expressions above. The upshot is that is the sum of Gaussian distributions, centered at the points

(38)

It is also easy to check that the coefficients have alternating sign, thus the Gaussians centered at with odd interfere constructively with the other Gaussians at odd positions, and destructively with those at even positions. We therefore expect to be described by peaks and valleys among them. Moreover, the support of the eigenvalue density grows linearly with the product . We plot the eigenvalue densities for various in Figure 3, and in the small and large regime in Figure 4.

Fig. 3: Eigenvalue density at . Left: . Right: .
Fig. 4: Eigenvalue density . Left: small regime, and . Right: large regime, and . Note that the height of the vertical axis in the left picture it twice the height in the other plots.

Ii-C Limits of

Before delving into a more detailed analysis of for , it is instructive to analyze various limits for generic .

Ii-C1 limit

First, it is clear from (22) that sending we recover the usual Gaussian -ensemble, whose partition function in known exactly for every , see [27, Eq. (1.160)]. In particular, with our normalization we have

(39)

for even (and a similar expression for odd), and

(40)
(41)

Comparing with formula (16), the limit (39) shows in particular that goes to zero as in the small limit, that is, small variance in the dataset.

Ii-C2 limit

Another possible limit is the limit. We give the result taking first the large limit and then approximating for large . It is convenient to write as

(42)

where we have changed variables and approximated at large . Making the ansatz that grows as for some and of order 1, we see that a saddle point exists in the large limit if the two terms in the exponential in (42) are of the same order in . The first term goes as and the second as , meaning that the large limit requires , that is . This implies that at leading order at large and large .

We can also foresee the form of the sub-leading correction. Indeed, it will come from integrating over fluctuations around the saddle point. The entries of the Hessian matrix from (42) are , and performing the Gaussian integrals and taking into account the coefficient we expect this sub-leading correction to be of order . Note, however, that (42) as it stands is not suitably written to study the saddle point equation. To get rid of the absolute value, we should restrict to the principal Weyl chamber in and look for a saddle point therein. We do not pursue this approach, and instead quote the result found in [43], where a rigorous approach to this limit has been undertaken, based on work by Baxter [44].444The extension of Baxter’s result [44] to arbitrary , as we need, is straightforward. The statistics of extreme eigenvalues in this regime has been recently studied in [45, 46]. One gets:

(43)

Ii-C3 Double-scaling limit

Another useful limit is the scaled large limit with and keeping their product fixed. In this limit, the leading contribution to comes from the saddle point configuration, and one finds

(44)

where is a -independent quantity, explicitly found solving the saddle point equation. To obtain (44) we have used the normalization , and the normalization (15) is recovered simply shifting in the case.

In [32] this scaled limit was studied and subdivided in three cases, according to the value of being finite, infinite or . Below, in the last Section, we will discuss the relationship between the probability densities in [17, 18, 19] with diffusion processes and how they appear in the study of Brownian motion on Weyl chambers of Lie groups.

Iii The cases of real symmetric and quaternion Hermitian matrices

According to the results in Subsection II-C, we can extract the behaviour of in certain limits from the knowledge of . Nevertheless, we can do more, and evaluate exactly and using standard tools in random matrix theory.

Iii-a Skew-symmetric products and partition function

We discuss , which is directly related to through (16). In contrast to the case , the partition functions and do not correspond to determinants, but rather to Pfaffians. It is convenient to discuss separately the two cases with even or odd, starting with the former. Before that, we define the skew-symmetric products [31]

(45)

where , and

(46)

Note that the coefficient in the exponent of the weight function is as defined in (15).

Computing the skew-symmetric products of any pair of monomials gives

(47)

and

(48)

where we have used the change of variables . In (48), is the error function, and in the last line we have used the integration formula for with a Gaussian weight. The error function is odd, , whence both products are manifestly skew-symmetric.

Assuming even, we use the de Bruijn identity [47] to write

(49)

with the skew-symmetric product given in (48). For odd, instead, the de Bruijn identity [47] leads us to the Pfaffian of a skew-symmetric matrix:

(50)

In both cases, we could expand the Pfaffian and obtain , and hence , as a finite sum of terms, explicitly known from (48). The advantage of this approach is that the Pfaffians can be obtained explicitly for fixed with the aid of a computer algebra. We provide numerical results at various and in Tables I and II, computed in Mathematica using the algorithm of [48].

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
2 -0.680 0.376 1.001 1.449 1.801 2.092 2.340 2.557 2.751 2.926 3.086 3.234
3 -2.776 -0.033 1.561 2.685 3.553 4.260 4.855 5.370 5.824 6.230 6.597 6.933
4 -2.173 1.446 3.628 5.223 6.495 7.566 8.497 9.328 10.080 10.773 11.416 12.020
5 -6.142 0.537 4.492 7.339 9.583 11.449 13.057 14.478 15.758 16.930 18.014 19.028
6 -4.374 3.418 8.200 11.755 14.641 17.108 19.287 21.259 23.073 24.765 26.359 27.874
7 -9.899 2.342 9.725 15.138 19.485 23.169 26.403 29.315 31.987 34.473 36.813 39.035
8 -7.182 6.511 15.056 21.513 26.837 31.456 35.596 39.390 42.927 46.264 49.444 52.496
9 -14.007 5.538 17.541 26.502 33.829 40.142 45.779 50.935 55.734 60.261 64.575 68.720
10 -10.489 10.945 24.546 34.983 43.719 51.401 58.374 64.842 70.936 76.742 82.323 87.725
11 -18.406 10.309 28.260 41.895 53.230 63.156 72.146 80.477 88.324 95.805 103.003 109.975
12 -14.188 16.951 37.033 52.678 65.954 77.778 88.632 98.803 108.472 117.760 126.751 135.506
1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4
2 3.372 3.500 3.621 3.735 3.844 3.947 4.046 4.141 4.232 4.319 4.404 4.486
3 7.243 7.530 7.799 8.051 8.290 8.516 8.731 8.936 9.133 9.322 9.504 9.680
4 12.590 13.131 13.649 14.145 14.624 15.086 15.534 15.969 16.393 16.807 17.211 17.607
5 19.984 20.891 21.757 22.588 23.388 24.162 24.914 25.645 26.358 27.055 27.738 28.409
6 29.323 30.716 32.062 33.368 34.638 35.879 37.093 38.283 39.452 40.602 41.735 42.854
7 41.161 43.207 45.186 47.108 48.981 50.813 52.608 54.371 56.107 57.817 59.505 61.173
8 55.443 58.303 61.090 63.815 66.486 69.110 71.694 74.242 76.759 79.248 81.713 84.154
9 72.726 76.619 80.417 84.135 87.785 91.377 94.918 98.4151 101.874 105.298 108.692 112.059
10 92.979 98.112 103.144 108.090 112.964 117.774 122.529 127.237 131.902 136.530 141.126 145.691
11 116.766 123.408 129.927 136.343 142.671 148.924 155.111 161.243 167.325 173.364 179.364 185.329
12 144.071 152.479 160.758 168.927 177.002 184.999 192.926 200.793 208.607 216.374 224.101 231.790
TABLE I: for from to and from to . Evaluation time s (whole table) on macbook.
2 4 6 8 10 12 14 16 18 20
4.150 13.822 29.007 49.694 71.487 93.286 113.424 135.129 156.057 175.890
22 24 26 28 30 32 34 36 38 40
199.322 219.185 244.263 267.156 295.836 319.471 349.711 378.587 410.486 444.106
TABLE II: for even from to and fixed . Evaluation time s (whole table) on macbook.

From we immediately obtain , which we show in Figure 5 as a function of for various , and in Figure 6 as a function of for various fixed values of . We also show the agreement of with the theoretical predictions of Subsection II-C for small and large in Figure 7.

Fig. 5: as a function of for various fixed values of .
Fig. 6: as a function of for . The dashed line shows the theoretical prediction in the large regime at .
Fig. 7: Left: in the small regime, . Right: in the large regime, . “sub.” stands for the sub-leading corrections obtained in (43).

While the Pfaffian representation of is very useful for computational purposes, the simple example can be easily solved by direct integration:

(51)

We now observe that the integral of an error function with a Gaussian weight is again an error function and, after simplifications, we obtain

(52)

This is a check of the result (49) for the smallest value of . For higher values of a direct computation becomes more involved. However, from the Pfaffian representation, we infer that the result is always a finite sum of products of error functions, weighted by integer powers of . So, for example, de Bruijn’s identity immediately gives

(53)