 # Information geometry and asymptotic geodesics on the space of normal distributions

The family N of n-variate normal distributions is parameterized by the cone of positive definite symmetric n× n-matrices and the n-dimensional real vector space. Equipped with the Fisher information metric, N becomes a Riemannian manifold. As such, it is diffeomorphic, but not isometric, to the Riemannian symmetric space Pos_1(n+1,R) of unimodular positive definite symmetric (n+1)×(n+1)-matrices. As the computation of distances in the Fisher metric for n>1 presents some difficulties, Lovrič et al. (2000) proposed to use the Killing metric on Pos_1(n+1,R) as an alternative metric in which distances are easier to compute. In this work, we survey the geometric properties of the space N and provide a quantitative analysis of the defect of certain geodesics for the Killing metric to be geodesics for the Fisher metric. We find that for these geodesics the use of the Killing metric as an approximation for the Fisher metric is indeed justified for long distances.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction and overview

A multivariate normal distribution is determined by its covariance matrix and its mean vector. So for a fixed , the family of -variate normal distributions is a differentiable manifold which can be identified with the product of the space of positive definite symmetric -matrices by the vector space . For various statistical purposes, it is desirable to have a measure of distance between the elements of . Such a distance measure is provided by the Fisher metric on , which is a Riemannian metric that appears naturally in a certain statistical framework. We briefly review some properties of Fisher metric on the normal distributions in Section 2.

Computing the distances on , however, turns out to be a non-trivial task. Even though explicit forms for the geodesics of the Fisher metric on are known (due to Calvo and Oller ), these only yield explicit formulas for the distance in particular cases. So Lovrič, Min-Oo and Ruh  proposed the use of a different metric in which distances are easier to compute. They map diffeomorphically onto the Riemannian symmetric space . This map is not an isometry between the Fisher metric and the metric of the symmetric space, which we call the Killing metric, but nevertheless, the two metrics are quite similar in appearance. So it is reasonable to ask how different they really are.

In Section 3 we describe the geometry of as a Riemannian homogeneous but non-symmetric space with the Fisher metric. In Theorem A we show that is a bundle whose base is the cone of symmetric positive definite -matrices and whose fiber is . This also gives rise to two pointwise mutually orthogonal foliations, one with leaves isometric to , the other with leaves isometric to .

To make a case for using the Killing metric as a sensible approximation for the Fisher metric, we compare the geometry of the Fisher metric and the geometry of the Killing metric in Section 4. We find that the Levi-Civita connection for the Fisher metric on the leaves is affinely equivalent to the Levi-Civita connection of the Killing metric. So unparameterized geodesics in these leaves are the same for the two metrics. In Theorem B, we show that Killing geodesics orthogonal to a leaf at some point are asymptotically geodesic in the Fisher metric, that is, their defect from being a Fisher geodesic tends to zero as their curve parameter tends to infinity. So we find that for two important classes of unparameterized geodesics, the Killing geodesics approximate or are identical to the corresponding Fisher geodesics. Though this is not an exhaustive comparison, it provides some justification to consider the easier to compute Killing metric as a good approximation for the Fisher metric.

### Notations and conventions

Throughout, we will assume matrices to be real-valued. For a matrix , we let denote its transpose. We also write

. The identity matrix is denoted by

or . By we denote the elementary matrix whose entry in row , column is , and all other entries are . Its symmetrization is . The canonical basis vectors of are denoted by .

As usual,

 GL(n,R) ={A∈Rn×n∣det(A)≠0}, SL(n,R) ={A∈GL(n,R)∣det(A)=1}, O(n) ={A∈GL(n,R)∣A⊤=A−1}, SO(n) ={A∈O(n)∣det(A)=1}

denote the general linear, special linear, and (special) orthogonal groups, respectively. The subgroup of of matrices with positive determinant is denoted by . The affine group is the semidirect product

 Aff(n,R)=GL(n,R)⋉Rn,

where the semidirect product is given by for . We also write .

By we denote the set of symmetric -matrices,

 Sym(n,R)={S∈Rn×n∣S=S⊤}.

We write for the corresponding subspaces of elements with trace . The subset of diagonal matrices in is denoted by .

The set of positive definite symmetric matrices in is denoted by ,

 Pos(n,R)={S∈Sym(n,R)∣x⊤Sx>0 for all non-zero x∈Rn}.

Its subset of unimodular elements is

 Pos1(n,R)={S∈Pos(n,R)∣det(S)=1}.

Recall that and .

## 2. Some background on information geometry

In this section we briefly review the concepts from information geometry that we use in the following. We mainly follow Amari and Nagaoka’s  presentation.

### 2.1. The Fisher metric and dual connections

Information geometry provides a framework to study a class of probability distributions

defined on a sample space and determined by finitely many parameters , where we assume for simplicity that depends smoothly on and . For example, the set of univariate normal distributions is parametrized by the mean

and the variance

.

In general, the set of admissible values for can be viewed as an

-dimensional differentiable manifold, and we can define a positive semidefinite bilinear tensor

on via

 (2.1) gij(θ)=−∫Ω∂2log(p(x;θ))∂θi∂θjp(x;θ) dx.

In the following we assume that is positive definite everywhere, so that is a Riemannian manifold. Then is called the Fisher metric on , and is called a statistical manifold.

In addition to the Fisher metric, there are two particular torsion-free affine connections defined on , denoted by and . These connections are dual to each other with respect to , which means that for all vector fields on ,

 (2.2) Zg(X,Y)=g(∇(e)ZX,Y)+g(X,∇(m)ZY).

Moreover, the affine combination

 ∇=12∇(e)+12∇(m)

yields the Levi-Civita connection of the Fisher metric .

The letters “e” and “m” stand for “exponential” and “mixture”, respectively, referring to two families of probability distributions in which these connections appear naturally. More generally, there is a whole family of affine connections with associated to , and , . However, we are not concerned with values here.

### 2.2. Exponential families

An exponential family is a statistical manifold that consists of probability distributions of the form

 p(x;θ)=exp(c(x)+θ1f1(x)+…+θnfn(x)−ψ(θ))

for given functions and . The normalization of implies

 (2.3) ψ(θ)=log(∫Ωexp(c(x)+θ1f1(x)+⋯+θnfn(x))dx).

The connections and are distinguished on an exponential family (see Amari and Nagaoka [1, Sections 2.3 and 3.3]).

###### Theorem 2.1.

Let be an exponential family. Then and are flat torsion-free affine connections on .

In fact, the form a flat coordinate system in the sense that , , for the coordinate vector fields . The flat coordinate system for is obtained via a Legendre transform of ,

 ∂ψ∂θi=ηi,i=1,…,n.

In the flat -coordinates, the Fisher metric for an exponential family is given as a Hessian metric , or equivalently

 (2.4) gij(θ)=∂2ψ(θ)∂θi∂θj.

We call the potential of the Fisher metric. The dual potential is given by , and in the flat -coordinates, the inverse is given as a Hessian metric

 (2.5) gij(η)=∂2ψ∗(η)∂ηi∂ηj.

Another important property of exponential families is the following (see Amari and Nagaoka [1, Theorem 2.5]).

###### Theorem 2.2.

A submanifold of an exponential family is totally geodesic in with respect to if and only if is an exponential family itself.

### 2.3. Normal distributions

The most important exponential family is formed by the normal distributions. An -variate normal distribution is determined by its covariance matrix and its mean by the following formula

 p(x;Σ,μ)=1√(2π)ndet(Σ)exp(−12(x−μ)⊤Σ−1(x−μ))

so the manifold we are considering is the space . The flat coordinates for the connection are , where

 ξ=μ∈Rn,Ξ=Σ+μμ⊤∈Pos(n,R),

and the flat coordinates for the connection are , where

 θ=Σ−1μ∈Rn,Θ=−12Σ−1∈Pos(n,R).

The potential in these coordinate systems is (compare (2.3))

 ψ(Σ,μ) =12μ⊤Σ−1μ+12log(det(2πΣ)), ψ(Ξ,ξ) =12ξ⊤(Σ−ξξ⊤)−1ξ+12log(det(2π(Ξ−ξξ⊤))), ψ(Θ,θ) =−14θ⊤Θθ−12log(det(−π−1Θ)).

## 3. Geometry of the family of normal distributions

In this section we take a closer look at the information geometry of the manifold . Note that as a product of manifolds.

### 3.1. Basic geometric properties of \mathpzcN

Here, we state the explicit form of the Fisher metric, its Levi-Civita connection and its curvature tensor in the -coordinates. These were originally computed by Skovgaard [9, 10].

If is the Fisher metric on , are two coordinate vector fields in the -directions, and are two coordinate vector fields in the -directions, then the metric tensor is

 (3.1) g(Σ,μ)((X,v),(Y,w))=v⊤Σ−1w+12tr(Σ−1XΣ−1Y),

and the Levi-Civita connection is determined by

 (3.2) ∇XY=∇YX=−12(XΣ−1Y+YΣ−1X),∇vw=∇wv=12(vw⊤+wv⊤),∇Xv=∇vX=−12XΣ−1v.

Note that the symmetry in these equations is due to the fact that we are looking at coordinate vector fields.

If and are coordinate vector fields in the - and -directions, respectively, then the curvature of the Fisher metric is determined by

 (3.3) R(v1,v2,v3,v4)= 14((v⊤2Σ−1v3)(v⊤1Σ−1v4)−(v⊤1Σ−1v3)(v⊤2Σ−1v4)),R(X1,X2,X3,X4)= 14(tr(X2Σ−1X1Σ−1X3Σ−1X4Σ−1) −tr(X1Σ−1X2Σ−1X3Σ−1X4Σ−1)),R(v1,v2,X1,X2)= 14(v⊤1Σ−1X1Σ−1X2Σ−1v2−v⊤1Σ−1X2Σ−1X1Σ−1v2),R(v1,X1,v2,X2)= 14v⊤1Σ−1X1Σ−1X2Σ−1v2.

We now consider the two foliations of into submanifolds of fixed or , respectively. For fixed , we will write

 \mathpzcN(⋅,μ0) ={(Σ,μ0)∣Σ∈Pos(n,R)}, \mathpzcN(Σ0,⋅) ={(Σ0,μ)∣μ∈Rn}.

It follows from (3.1) that the two foliations determined by these submanifolds are orthogonal.

Recall that the second fundamental form of a submanifold of is the normal component of in for two vector fields tangent to . We let denote the coordinate vector field in direction , and we let denote the coordinate vector field in direction . We denote by the set enumerating the coordinates of and by the set enumerating the coordinates of , and set . When we refer to an index , it may mean either a single index from or an index pair from . Then the Christoffel symbols for the Levi-Civita connection are denoted by with .

###### Proposition 3.1.

For any and with respect to the Fisher metric of , the submanifold is totally geodesic.

###### Proof.

By (3.2), is tangent to for all . An arbitrary tangent vector field to can be written as , with . Then

 ∇∂(ij)X =∑p∈J(∂(ij)wp+∑q∈JΓp(ij)qwq)∂p =∑p∈J(∂(ij)wp+∑(k,l)∈JΣΓp(ij)(kl)wkl)∂p (wm=0 for m∈Jμ) =∑(r,s)∈JΣ(∂(ij)wrs+∑(k,l)∈JΣΓ(rs)(ij)(kl)wkl)∂rs (Γm(ij)(kl)=0=wm for m∈Jμ).

This last expression is the induced covariant derivative on the submanifold , since the - and -directions are orthogonal everywhere. Hence the second fundamental form of vanishes, which means is totally geodesic. ∎

###### Proposition 3.2.

For any and with respect to the Fisher metric of , the submanifold is parallel. Also, the second fundamental form of satisfies

 B(ei,ej)=12(Eij+Eji)

for all .

###### Proof.

The second fundamental form of is given by

 B(∂i,∂j)=∑(k,l)∈JΣΓ(kl)ij∂(kl),

where .

Denote by and the normal and induced connection for , respectively. By (3.2), is a flat connection on . Then the covariant derivative of is given by ()

 (∇∂mB)(∂i,∂j)=∇⊥∂m(B(∂i,∂j))−B(¯¯¯¯¯∇∂m∂i,∂j)−B(∂i,¯¯¯¯¯∇∂m∂j)=∇⊥∂m(B(∂i,∂j)),

where the last identity holds since is flat and come from affine coordinates. Hence, we have for all

 (∇∂mB)(∂i,∂j) =∇⊥∂m(B(∂i,∂j)) =∇⊥∂m(∑(k,l)∈JΣΓ(kl)ij∂(kl)) =∑(k,l)∈JΣ(∂mΓ(kl)ij)∂(kl)+∑(k,l),(r,s)∈JΣΓ(kl)ijΓ(rs)(kl)m∂(rs).

In this expression, and due to equation in (3.2). These computations imply that , in other words that is parallel.

On the other hand, to compute we use (3.2),

 B(ei,ej)=12(eie⊤j+eje⊤i)=12(Eij+Eji),

where we have used the identification of basis vector with their corresponding partial differential operators. ∎

From the previous result the submanifold is not totally geodesic. Hence, is not the Riemannian product of and even though they are mutually orthogonal.

### 3.2. \mathpzcN as a homogeneous space

It is well-known that the affine group acts transitively on by

 (3.4) (A,b)⋅(Σ,μ)=(AΣA⊤,Aμ+b),

where , , . Furthermore, the action remains transitive when restricted to . The tangent space can be identified with the vector space . Given and , the tangent action of is

 (3.5) (A,b)⋅(X,v)=(AXA⊤,Av).

Thus we can identify

 T(Σ,μ)\mathpzcN ≅ (A⋅Sym(n,R)⋅A⊤)⊕Rn,

where .

###### Lemma 3.3.

The affine group acts transitively and isometrically on by (3.4). Moreover, if denotes the subgroup of lower triangular matrices with positive diagonal entries, then the subgroup acts simply transitively on .

###### Proof.

The transitivity is a well-known fact. It remains to check that (3.4) is isometric. The tangent action of is (3.5), hence

 g(AΣA⊤,Aμ+b)((AXA⊤,Av),(AXA⊤,Av)) =(Av)⊤(AΣA⊤)−1(Av)+12tr((AΣA⊤)−1AXA⊤(AΣA⊤)−1AXA⊤) =v⊤Σ−1v+12tr(A−⊤Σ−1XΣ−1XA⊤)=v⊤Σ−1v+12tr(Σ−1XΣ−1X) =g(Σ,μ)((X,v),(X,v)).

This shows that the action is isometric.

Note that is equivalent to , . So the stabilizer of at is . From the Iwasawa decomposition it follows that acts simply transitively. ∎

### 3.3. Geometry of Pos(n,R)

As a consequence of Proposition 3.1 and Theorem 2.2, the Fisher metric of the family of normal distributions with mean coincides with the restriction of the Fisher metric of to . Since all of these submanifolds are isometric, we may take for convenience. In the following, we will make explicit how with its Fisher metric is isometric to a symmetric space with a suitably scaled Killing metric.

Consider the product of irreducible Riemannian symmetric spaces

 M=R×Pos1(n,R),

where its Riemannian metric is the product of the metric , which is times the multiplication on , and the metric on given by . Let act on via

 A⋅(α,Σ)=(α+2log(det(A)), det(A)−2AΣA⊤).
###### Lemma 3.4.

The -action on given above is by isometries.

###### Proof.

The tangent action of at on is

 dA(α,Σ)(t,X)=(t, det(A)−2AXA⊤).

Hence

 gM,A⋅(α,Σ)(dA(α,Σ)(t1,X1),dA(α,Σ)(t2,X2)) =gM,A⋅(α,Σ)((t1,det(A)−2AX1A⊤),(t2,det(A)−2AX2A⊤)) =g1,α+2log(det(A))(t1,t2)+g2,det(A)−2AΣA⊤(det(A)−2AX1A⊤,det(A)−2AX2A⊤) =12t1t2+12tr((det(A)−2AΣA⊤)−1det(A)−2AX1A⊤(det(A)−2AΣA⊤)−1det(A)−2AX2A⊤) =12t1t2+12tr(Σ−1X1Σ−1X2)=gM,(α,Σ)((t1,X1),(t2,X2))

Hence the action of is isometric. ∎

Now define a map

 (3.6) Ψ:Pos(n,R)→R×Pos1(n,R),Σ↦(log(det(Σ)), det(Σ)−1Σ).

Note that for ,

 Ψ(A⋅Σ) =(log(det(AΣA⊤)), det(AΣA⊤)−1AΣA⊤) =(log(det(Σ))+2log(det(A)), det(A)−2det(Σ−1)AΣA⊤) =A⋅(det(Σ), det(Σ)−1Σ) =A⋅Ψ(Σ).

So the map is -equivariant.

We equip the manifold with the restriction of the Fisher metric (3.1) of to , which is the Fisher metric of by Proposition 3.1. Then acts isometrically on by Lemma 3.3.

###### Proposition 3.5.

The Riemannian manifold is isometric to the product of the irreducible Riemannian symmetric spaces and . In particular, is a Riemannian symmetric space.

###### Proof.

The map defined in (3.6) is the desired isometry. In fact, is -equivariant with respect to the isometric -actions on and , and since (where ), it is enough to show that is an isometry at . So let . The differential of at is

 dΨIX =ddt∣∣∣t=0(log(det(I+tX)), det(I+tX)−1(I+tX)) = (det(I+tX)−1ddtdet(I+tX), = (tr(X), X−tr(X)I).

Then

 gM,Ψ(I)(dΨIX,dΨIY) =12tr(X)tr(Y)+12tr((X−tr(X)I)(Y−tr(Y)I)) =12tr(X)tr(Y)+12tr(XY−tr(X)Y−tr(Y)X)+12tr(X)tr(Y) =tr(X)tr(Y)+12tr(XY)−12tr(tr(X)Y)−12tr(tr(Y)X) =12tr(XY)=gI(X,Y).

This shows that is an isometry and concludes the proof of the proposition. ∎

.

###### Proof.

Let and let be a subgroup of such that . Let , denote the respective Lie algebras of , , and the Cartan involution. Since is a symmetric product by Proposition 3.5, and split as a products and , , such that and are the symmetric Lie algebras associated to and , respectively (cf. Kobayashi & Nomizu [7, Section XI.5]). Since and is simple, by Helgason [6, Theorem V.4.1]. Hence

 dimG=dimSL(n,R)+dimIso(R,gR)=(n2−1)+1=dimGL(n,R)

and clearly , so that . ∎

### 3.4. Bundle geometry and foliations on \mathpzcN

Let denote the Fisher metric on . We can now describe the geometry of in terms of Riemannian symmetric spaces.

###### Theorem A.

Consider the family of -variate normal distributions equipped with the Fisher metric , given by (3.1). The following hold:

1. is a vector bundle

 Rn⟶\mathpzcN⟶Pos(n,R),

where the base is equipped with the Fisher metric and the fiber over is with scalar product determined by .

2. The base can be identified with the totally geodesic submanifold for any , and it is isometric to a product of irreducible Riemannian symmetric spaces

 Pos(n,R)=R×Pos1(n,R)

with the metrics on the factors given in Proposition 3.5.

3. The fiber over can be embedded as a parallel submanifold for any fixed , and as such it is orthogonal at to the embedding of the base as .

4. The submanifolds