# Bures-Wasserstein Geometry

The Bures-Wasserstein distance is a Riemannian distance on the space of positive definite Hermitian matrices and is given by: d(Σ,T) = [tr(Σ) + tr(T) - 2 tr(Σ^1/2TΣ^1/2)^1/2]^1/2. This distance function appears in the fields of optimal transport, quantum information, and optimisation theory. In this paper, the geometrical properties of this distance are studied using Riemannian submersions and quotient manifolds. The Riemannian metric and geodesics are derived on both the whole space and the subspace of trace-one matrices. In the first part of the paper a general framework is provided, including different representations of the tangent bundle for the SLD Fisher metric. The last part of the paper unifies up till now independent arguments and results from quantum information theory and optimal transport. The Bures-Wasserstein geometry is related to the Fubini-Study metric and the Wigner-Yanase information.

## Authors

• 1 publication
• ### Wasserstein-Riemannian Geometry of Positive-definite Matrices

The Wasserstein distance on multivariate non-degenerate Gaussian densiti...
01/28/2018 ∙ by Luigi Malagò, et al. ∙ 0

• ### Riemannian geometry for Compound Gaussian distributions: application to recursive change detection

A new Riemannian geometry for the Compound Gaussian distribution is prop...
05/20/2020 ∙ by Florent Bouchard, et al. ∙ 0

• ### Projection Robust Wasserstein Distance and Riemannian Optimization

Projection robust Wasserstein (PRW) distance, or Wasserstein projection ...
06/12/2020 ∙ by Tianyi Lin, et al. ∙ 0

• ### Regularization of covariance matrices on Riemannian manifolds using linear systems

We propose an approach to use the state covariance of linear systems to ...
05/29/2018 ∙ by Lipeng Ning, et al. ∙ 0

• ### Fisher-Rao distance on the covariance cone

The Fisher-Rao geodesic distance on the statistical manifold consisting ...
10/29/2020 ∙ by Joseph Wells, et al. ∙ 0

• ### Coupling Matrix Manifolds and Their Applications in Optimal Transport

Optimal transport (OT) is a powerful tool for measuring the distance bet...
11/15/2019 ∙ by Dai Shi, et al. ∙ 0

• ### Schrödinger encounters Fisher and Rao: a survey

In this short note we review the dynamical Schrödinger problem on the no...
04/01/2021 ∙ by Léonard Monsaingeon, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper we investigate the geometrical properties of the Bures-Wasserstein (BW) distance on the space of positive definite symmetric matrices, . For , this distance function is given by:

 dBWP(n)(Σ,T)=[tr(Σ)+tr(T)−2tr((Σ1/2TΣ1/2)1/2)]1/2 (1)

This function appears in optimal transport as a distance measure on the space of mean-zero Gaussian densities, where it is called the Wasserstein distance. In quantum information theory, this is a distance measure between quantum states or density matrices, called the Bures distance.

The development of this subject started when Rao realised that the Fisher information defines a Riemannian metric on the space of probability measures

rao . He obtained this metric by mapping the positive orthant of the unit sphere, equiped with the Euclidian metric, to the probability simplex using the square map. Later, the study of the geometrical properties of the probability simplex was extended to the quantum realm with notable contributions of Nagaoka and Petz. An overview of this field can be found in hayashi and bengtsson . The distance measure defined in (1) was introduced by Helstrom helstrom and Bures bures as a measure of similarity between quantum states. In uhlmann , Uhlmann derives the geometrical properties of this distance measure using a generalisation of the argument of Rao. This derivation is described in more detail in section 3.

In the context of optimal transport, this distance measure was derived to be the

-Wasserstein distance on the space of covariance matrices for mean zero Gaussian distributions

olkin . Its geometrical properties were first studied in this context by Takatsu takatsu . It is interesting to note that the argument used in this paper is similar to but independent of the argument by Uhlmann fifteen years prior. Recently pis1 and bhatia1 built upon the work of Takatsu. Bhatia discusses both the quantum and the optimal transport interpretation of the distance and introduces the name Bures-Wasserstein distance. Furthermore, the argument of Takatsu is refined using facts on Riemannian submersions and quotient manifolds. The current paper adapts the construction of Bhatia in order to obtain the geometrical structure for the submanifold of trace-one matrices, which is of particular importance in quantum information. The aim is to both unify work from optimal transport and quantum information and simplify the original argument by Uhlmann.

More recently, the geometrical structure discussed in this paper is of interest in the field of optimisation. It turns out that for this choice of geometry the exponential and logarithmic map are cheap to evaluate, which makes it particularly suitable for numerical computationsmassart .

The first section of the paper discusses preliminary facts needed to put the main results into perspective. A definition of the -and -representations is introduced which is easily compatible with the existing definitions from both classical and quantum information geometry. These representations are worked out explicitely for the SLD Fisher metric and the Bogoliubov metric. The main results of this paper can be found in the second section of the paper, where the geometrical structure of the BW distance is investigated first on and this is then restricted to the trace-one subset, . The last section of the paper compares the geometrical structure obtained in the foregoing to similar results in the field. An overview of the notation used in the paper can be found on page Notation.

### 1.1 Preliminaries

#### Differential geometry

Let be smooth manifolds and a smooth map. We will denote the differential of at by: . For a Riemannian metric on

, the length of a tangent vector

is given by: , and the length of a curve is given by:

 L(γ)=∫ba||γ′(t)||gdt (2)

where . The Riemannian distance between is defined to be:

 dM(p,q)=inf{L(γ): γ:[a,b]→M,γ(a)=p,γ(b)=q}. (3)

See chapter 2 of lee2018 for details.

###### Definition 1

Let V be a real or complex vector space. An affine subspace of is a subset together with a vector subspace such that:

• such that

• and we have:

Now let be an open convex subset of an affine subspace of with associated vector space and fixed. We have that the following map is a vector space isomorphism lee2012 :

 id1:˜V →TpM (4) ~v ↦v (5) id1(~v)(f) =ddtf(p+t~v)|t=0. (6)

where is any smooth function on . We can therefore identify every tangent vector in with an element of through . Given a basis for , we define the Euclidian inner product on to be such that . The Euclidian metric on is defined such that for , we have . The Riemannian distance associated to is denoted .

Let be either or and a fixed basis for . This basis induces a coordinate map such that for we have . We define the -representation of an element in to be the coordinate representation of its -associated element in . Or in symbols,

 (m):TpMid1−→˜Vk→Km (7)

Using Riesz’ representation theorem we let be the identification between with its dual such that: . We define the -representation of an element of to be the -representation of its -associated element in . In symbols,

 (m∗):T∗pMid2−→TpM(m)−−→Km. (8)

A general Riemannian metric on gives a final identification, , between and in the same way as above: . Given a metric , we define the -representation of an element of as the -representation of its -associated element in . In symbols,

 (e):TpMid3(g)−−−→T∗pM(m∗)−−−→Km. (9)

Note that the definition of the -and -representation implies:

 gp(v,w)=Re(⟨v(e),w(m)⟩) (10)

with on the right the standard inner product on .

###### Remark 1

The definitions above are inspired by Chapter 2 of ay . We will see in section 1.2 and 1.3 that the definitions correspond to the ones given in e.g. amari2007 , hayashi .

#### Matrix identities

Let be the space of invertible complex matrices and the unitary matrices. Every can be written as where and . This is called the polar decomposition of and is called the unitary polar factor. In the following theorem, sometimes referred to as Uhlmann’s theorem, the unitary polar factor shows up in a maximisation problem.

###### Theorem 1.1

Consider the following maximisation problem:

 supV∈U(n)Retr(ΣVT) (11)

Then the supremum is attained for , with the unitary polar factor of .

###### Proof

The solution for to the Lyapunov equation: with will be denoted . It turns out that this solution exists and is unique bhatia2 .

### 1.2 Classical information geometry

In this section we will study and , the space of stricly positive (resp. probability) measures on . These spaces are open subsets of affine subspaces of the vector space of signed measures . The canonical basis of is given by the set of Dirac delta measures such that . This basis gives us the -representation as described in the preliminaries. Now we let the metric on and be the Fisher information metric, given by:

 gFμ(a,b)=n∑i=1a(m)ib(m)iμ(ωi) (12)

with in the tangent space of or at . The -representation is given as follows:

 a(e)i=a(m)iμi (13)

Another way of obtaining the -representation for this case is by applying the -representation to the pushforward of under the logarithm map. If then:

 a(e)=(dlogμ(a))(m). (14)

We wil see however that this expression of the -representation is not general enough for the quantum case.

#### Hellinger distance, Fisher metric and Fisher distance

In this section we derive the Riemannian metric corresponding to the Hellinger distance on . Then we find the Riemannian distance corresponding to the restriction of this metric to . These will turn out to be the Fisher metric and Fisher distance respectively. This derivation was first due to Rao rao and can be seen as a special case of the derivation given in the second part of the paper.

Restricted to the subset of diagonal matrices, it is easy to see that the BW distance on given in (1) has the following form:

 dBWP(n)(D1,D2)=[tr((D1/22−D1/21)2)]1/2. (15)

Interpreting these diagonal matrices as elements of , we note that this distance corresponds to the Hellinger distance, given by: . The Hellinger distance can be obtained as the pushforward of the Euclidian distance under the square map:

 (M+(Ω),¯d)∋μ↦μ2∈(M+(Ω),dH) (16)

Extending the structure on the left from a distance function to its corresponding Riemannian metric , we have the following isometry111The isometries in this section are defined up to a constant.:

 (M+(Ω),¯g)∋μ↦μ2∈(M+(Ω),gF) (17)

From (16) and (17) it follows that the Hellinger distance is the geodesic distance for the Fisher metric in .

We now aim to find the Riemannian distance for the Fisher metric restricted to . It turns out that for this subset the geodesic distance is no longer the Hellinger distance. In order to find the right geodesic distance, we can use the fact that the following restriction of (17) remains an isometry:

 (SM+(Ω),¯g)∋μ↦μ2∈(P+(Ω),gF). (18)

where , the unit sphere in . We know that on this space the geodesics are given by greatcircles and therefore we can also compute the Riemannian distance. Using the fact that this distance is carried over by the isometry we obtain the Riemannian distance for the space of probability measures with the Fisher metric, called the Fisher distance. This is given by:

 dF(p,q)=arccos(n∑i=1(p(ωi)q(ωi))1/2). (19)

### 1.3 Quantum information geometry

Let be the set of Hermitian matrices and be the subset of positive definite Hermitian matrices with trace one. Within the context of section 1.1, we have , and . The basis vectors for are simply given by , where the -th entry of is one and the rest zero. From this we get the -representation for . For the submanifold of diagonal matrices (probability measures) Chentsov showed that the Fisher metric is the unique metric satisfying certain (statistically) natural conditions on the metric chentsov . Petz proved that for , this uniqueness no longer exists petz . One of the suggested generelisations is the symmetrised logarithmic derivative (SLD) Fisher metric. See e.g. amari2007 , hayashi . For this Hermitian metric is given explicitely by:

 gSLDρ(H,K)=2tr(Lρ(H(m))K(m)) (20)

We will derive in the second part of this paper that that the Riemannian metric corresponding to the BW distance on is given by:

 gBWΣ(H,K)=12Retr(LΣ(H(m))K(m)). (21)

Furthermore we will prove that the Riemannian distance on for this metric is given by:

 dBWD(n)(ρ1,ρ2)=arccos(Retr((ρ1/22ρ1ρ1/22)1/2)) (22)

Because the real parts of and are equal on , we can conclude that is the distance function for (up to a constant).

#### (e)-representations in quantum information geometry

The SLD Fisher metric for is given up to a constant by:

 gSLDρ(H,K)=tr(Lρ(H(m))K(m)). (23)

From the preliminaries it follows that for this choice of metric and we have the following relation betweem the - and -representation:

 H(e) =Lρ(H(m)) (24) H(m) =H(e)ρ+ρH(e) (25)

Expressing the SLD Fisher metric in terms of the -representation therefore gives the potentially more familiar form:

 gSLDρ(H,K)=tr(H(e)(K(e)ρ+ρK(e))) (26)

Another common metric is the Bogoliubov metric. In the -representation this is given by:

 gBoρ(H,K)=tr((dlogρ(H))(m)K(m)). (27)

The relation of the - and -representation is given by:

 H(e) =(dlogρ(H))(m) (28) H(m) =∫10ρλH(e)ρ1−λdλ (29)

The Bogoliubov metric in -representation is therefore given by:

 gBoρ(H,K)=tr(H(e)∫10ρλK(e)ρ1−λdλ). (30)
###### Remark 2

In the rest of the paper we will exclusively and implicitely use the -representation for the elements of the tangent bundle of and .

## 2 Bures-Wasserstein Geometry

In this section we explore the geometry induced by the Bures-Wasserstein distance. We start by finding the metric and geodesics corresponding to the BW distance on . Subsequently, we restrict the obtained metric to and derive the corresponding distance function and geodesics. The flow of the argument is analogues to Section 1.2, where we start from the Hellinger distance on , derive the Fisher metric and subsequently find the Riemannian distance and geodesics for this metric restricted to the submanifold . We start by discussing some general results from Riemannian geometry.

Let and be Riemannian manifolds and a smooth submersion. We can make the following orthogonal decomposition of the tangent space at :

 TpM=V(π,p)⊕H(π,p,g) (31)

where is the kernel of and is its orthogonal complement with respect to the metric at . We will refer to these subspaces as vertical and horizontal respectively. A curve in is said to be horizontal if is horizontal for all . We say that a submersion is Riemannian if for all and the following holds:

 gp(v,w)=hπ(p)(dπpv,dπpw). (32)

That is, is a vector space isometry.

###### Theorem 2.1

If is a Riemannian manifold and compact Lie group of isometries of acting freely on , then there exists a unique such that the quotient map is a Riemannian submersion.

###### Proof

Corrolary 2.29 in lee2018 . ∎

###### Theorem 2.2

Let be a Riemannian submersion.
For every geodesic in such that is horizontal we have:

• is horizontal for all t.

• is a geodesic in of the same length as

For every curve in we have that:

• there exists a unique horizontal curve in , denoted , such that .

###### Theorem 2.3

Let be a Riemannian submersion and the Riemannian distance function on . The Riemannian distance function on , , is equal to:

 d′N(p,q)=inf~p∈π−1(p)~q∈π−1(q)dM(~p,~q). (33)

We will call the pushforward distance.

###### Proof

Recall from equation (3) in the preliminaries that the Riemannian distance function on is given by

 dN(p,q)=inf{L(~γ) : ~γ:[0,1]→N,~γ(0)=p,~γ(1)=q} (34)

For every on the RHS we can find a curve in , namely , such that and and . Therefore we have .

For the reverse, we note that for every curve in we have

 L(γ) =∫10||γ′(t)||g dt (35) ≥∫10||dπγ(t)(γ′(t))||h dt (36) =∫10||(π∘γ)′(t))||h dt (37) =L(π∘γ) (38)

where in the second line we use that is a Riemannian submersion. From this it follows immediately that . ∎

### 2.1 Geometry on the space P(n)

#### Riemannian metric and distance function

In this section we will prove the following theorem.

###### Theorem 2.4

The Bures-Wasserstein distance on given in (1) is a Riemannian distance. The corresponding metric at is given by:

 gBWΣ(H,K)=Retr(LΣ(H)ΣLΣ(K))=12Retr(LΣ(H)K) (39)

In order to prove theorem 2.4 we need some preliminary results. The final proof can be found on page 2.1.

From Theorem 2.1 we know that there exists a metric such that the quotient map is a Riemannian submersion. We make the following identification: . This gives us the following map:

 π:(GL(n),¯g) →(P(n),h) (40) M ↦MM∗ (41)

We will derive the prelimary results in the following order. First, we find the horizontal and vertical subspaces for (proposition 1), which we use to show that is given by (39) (proposition 2). Next, we show that the BW distance on is equal to the pushforward distance given in (33) for (proposition 3). Then we can use theorem 2.3 to conclude that the BW distance is actually the Riemannian distance for .

###### Proposition 1

Let and

be the set of Hermitian and skew-Hermitian matrices, respectively. The vertical and horizontal space of

at are given by:

 V(π,M) ={K(M−1)∗:K∈H⊥(n)} (42) H(π,M,¯g) ={HM:H∈H(n)} (43)
###### Proof

We have:

 dπM(A)=AM∗+MA∗. (44)

Therefore . Furthermore, we have that . ∎

###### Proposition 2

The metric on is given by (39).

###### Proof

Because is a Riemannian submersion we know from (32), that for , needs to satisfy:

 ¯gM(A,B)=hMM∗(dπMA,dπMB) (45)

working this out gives:

 Retr(AB∗)=hMM∗(MA∗+AM∗,MB∗+BM∗). (46)

Now we plug in for . Then (46) becomes:

 Retr(~HMM∗~K)=hMM∗(MM∗~H+~HMM∗,MM∗~K+~KAA∗). (47)

If we set and , we get for general and :

 hΣ(H,K)=Retr(LΣ(H)ΣLΣ(K)). (48)

Using the properties of the trace we have:

 Retr(LΣ(H)ΣLΣ(K))=Retr(LΣ(K)ΣLΣ(H))=Retr(LΣ(H)LΣ(K)Σ) (49)

Adding the first and last expression gives:

 2hΣ(H,K)=tr[LΣ(H)(ΣLΣ(K)+LΣ(K)Σ)]=Retr(LΣ(H)K). (50)

Dividing both sides by two gives the final result. ∎

In order to show that the BW distance on is equal to the pushforward distance of , we first have to investigate the distance on . We know that on the distance is given by: . Because we have . However we can show, using the following lemmata, that for some choices of and the curve stays in and thus the two distances are equal.

###### Lemma 1

For and , the unitary polar factor of , we have that .

bhatia1 . ∎

###### Lemma 2

For and with and as in lemma 1 we have that is in for .

###### Proof

We can write:

 γ(t)=((1−t)I+tTUΣ−1)Σ. (51)

By the previous lemma know that is positive definite. Therefore we have that is positive definite for and thus in . Since is closed under multiplication we have that . ∎

Now we are in position to study the pushforward distance (33) for . We have that and . Plugging this in gives the following distance function:

 d′P(n)(Σ1,Σ2) =inf{¯dGL(n)(M1,M2):Mi∈π−1(Σi)} (52) =infU,V∈U(n)¯dGL(n)(Σ1/21V,Σ1/22U) (53)
###### Proposition 3

The BW distance on is equal to the pushforward distance . That is,

 [tr(Σ1)+tr(Σ2)−2tr((Σ1/21Σ2Σ1/21)1/2)]1/2=infU,V∈U(n)¯dGL(n)(Σ1/21V,Σ1/22U) (54)

Moreover, the infimum on the right is attained when and the unitary polar factor of given by .

###### Proof

From the discussion above we know:

 d′2P(n)(Σ1,Σ2) =infU,V∈U(n)¯d2GL(n)(Σ1/21V,Σ1/22U) (55) ≥infU,V∈U(n)¯d2Cn×n(Σ1/21V,Σ1/22U) (56) =infU,V∈U(n)||Σ1/21V−Σ1/22U||22. (57) =infU,V∈U(n)tr((Σ1/21V−Σ1/22U)(Σ1/21V−Σ1/22U)∗) (58) =tr(Σ1)+tr(Σ2)−2supU,V∈U(n)Retr(Σ1/21VU∗Σ1/22) (59)

We saw in theorem 1.1 of the preliminaries that the supremum on the right is obtained for and as in the proposition. Moreover, by lemma 2 we have that for this choice stays in and thus we have . Therefore we get equality in (56) and conclude:

 d′P(n)(Σ1,Σ2) =[tr(Σ1)+tr(Σ2)−2tr((Σ1/21Σ2Σ1/21)1/2)]1/2 (60) =dBWP(n)(Σ1,Σ2) (61)

###### Proof

(of theorem 2.4) By theorem 2.1 we know that there exists a unique metric such that as defined in equation (40) is a Riemannian submersion. In lemma 2 we saw that this metric is given by , as defined in (39) in the statement of the theorem. From theorem 2.3 we know that the Riemannian distance for this metric is given by the pushforward distance for , (52). In proposition 3 we saw that this distance is equal to the BW-distance on . This is what we set out to proof. ∎

#### Geodesics

In order to find a geodesic between and in , according to theorem 2.2, we need to find a geodesic in between points in and such that is horizontal.

###### Theorem 2.5

A geodesic between and in is given by , where

 γ(t)=(1−t)Σ1/21+tΣ1/22U (62)

and U is again the unitary polar factor of .

###### Proof

It is clear that is a geodesic in . We saw in lemma 2 that stays in . It remains to show . We have:

 γ′(0) =Σ1/22U−Σ1/21 (63) =(Σ1/22UΣ−1/21−I)Σ1/21 (64)

By lemma 1 we have that is Hermitian and thus the same holds for . Using proposition 1 we conclude is horizontal. The statement of the theorem now follows from theorem 2.2. ∎

### 2.2 Geometry on the space D(n)

#### Inner product and distance function

Let us denote the unit sphere in by:

 SCn×n≡{A∈Cn×n:tr(AA∗)=1) (65)

and , its restriction to . Note that:

 π−1(D(n))=SGL(n), (66)

We now apply theorem 2.1 to this submanifold of . We choose the metric to be the restriction of to and the Lie group again . Since both the metric and the quotient map are just restrictions of the ones in theorem 2.4, the resulting metric on will also be the restriction of . From theorem 2.3 we therefore know that the Riemannian distance function on corresponding to this restricted metric is given by the pushforward distance defined in (33). Just as in the classical case (section 1.2) where the Fisher distance on is different from the Hellinger distance, it will turn out that the Riemannian distance on is different from the BW distance on . In order to compute the distance on , we first investigate the geometry on .

Geodesics on a Euclidian sphere are obtained by intersecting the sphere with (hyper)planes through the origin. If and are two non-antipodal point on we can obtain the unnormalised geodesic by projecting the geodesic in onto . More specifically, if is the geodesic in , then

 ~γ(t)=γ(t)||γ(t)||2 (67)

is the unnormalised geodesic in . Moreover, we have that since they are scalar multiples of each other. The distance on is given by:

 dSCn×n(M,N)=arccos(Retr(MN∗)). (68)

Just as before, we have that in general , but when stays in , we have that the two distances are equal. We are now inp position to deduce the Riemannian distance function on .

###### Theorem 2.6

On , the Riemannian distance for the Bures-Wasserstein metric is given by:

 dBWD(n)(ρ1,ρ2)=arccos(Retr((ρ1/22ρ1ρ1/22)1/2)) (69)
###### Proof

From the definition of the quotient distance we have:

 dBWD(n)(ρ1,ρ2) =infU,V∈U(n)¯dSGL(n)(ρ1/21V,ρ1/22U) (70) ≥infU,V∈U(n)¯dSCn×n(ρ1/21V,ρ1/22U) (71) =infU,V∈U(n)arccos(Retr(ρ1/21VU∗ρ1/22