The differential geometric approach to probability theory and statistics has met increasing interest in the past years, from the theoretical point of view as well as in applications. In this approach, probability distributions are seen as elements of a differentiable manifold, on which a metric structure is defined through the choice of a Riemannian metric. Two very important ones are the Wasserstein metric, central in optimal transport, and the Fisher-Rao metric (also called Fisher information metric), essential in information geometry. Unlike optimal transport, information geometry is foremost concerned with parametric families of probability distributions, and defines a Riemannian structure on the parameter space using the Fisher information matrix
. In parameter estimation, the Fisher information can be interpreted as the quantity of information on the unknown parameter contained in the model. As the Hessian of the well-known Kullback-Leibler divergence, it measures through the notion of curvature the capacity to distinguish between two different values of the parameter. Rao showed that it could be used to locally define a scalar product on the space of parameters, interpretable as a Riemannian metric. An important feature of this metric is that it is invariant under any diffeomorphic change of parameterization. In fact, considering the infinite-dimensional space of probability densities on a given manifold , there is a unique metric, which also goes by the name Fisher-Rao, that is invariant with respect to the action of the diffeomorphism group of [3, 2]. This metric induces the regular Fisher information metric on the finite dimensional submanifolds corresponding to the parameterized statistical models of interest in information geometry. Arguably the most famous example of Fisher-Rao geometry of a statistical model is that of the Gaussian model, which is hyperbolic. The multivariate Gaussian case, among other models, has also received a lot of attention [1, 11].
In this work, we are interested in beta distributions, a family of probability measures on
used to model random variables defined on a compact interval in a wide variety of applications. Up to our knowledge, the information geometry of beta distributions has not yet received much attention. In this paper, we give new results and properties for this geometry, and its curvature in particular. Interestingly, this geometric framework yields new by-product tools to study the set of all moments of compactly supported probability measures on the real line. This is achieved through the so-called canonical moments representation, an alternative to the usual moment representation of a probability distribution that satisfies interesting symmetries and invariance properties.
The paper is organized as follows. Section 2
deals with the study of the Fisher-Rao geometry of beta distributions. We derive the geodesic equations, prove that sectional curvature is negative, give some bounds and observe a geometrical manifestation of the central limit theorem. Section3 deals with the application to canonical moments. After a brief presentation of these objets, we propose a representation in the product beta manifold, allowing us to use the Fisher-Rao geometry of beta distributions to compare and analyze canonical moments.
2 Geometry of the beta manifold
2.1 The beta manifold
Information geometry is concerned with parametric families of probability distributions, i.e. sets of distributions with densities with respect to a common dominant measure parameterized by a parameter member of a given set . That is, a collection of measures of the kind
We assume that is a non empty open subset of . Associated to any such family is the Fisher information matrix, defined for all as
As an open subset of , is a differentiable manifold and can be equipped with a Riemannian metric using this quantity. This gives the Fisher information metric on the parameter space
denotes the transpose of the vector. By extension, we talk of the Fisher geometry of the parameterized family , and of the Riemannian manifold .
In this paper, we are interested in beta distributions, a family of probability distributions on with density with respect to the Lebesgue measure parameterized by two positive scalars
We consider the Riemannian manifold composed of the parameter space and the Fisher metric , and by extension denote by beta manifold the pair , where is the family of beta distributions
Here denotes the Lebesgue measure on . The distance between two beta distributions is then defined as the geodesic distance associated to the Fisher metric in the parameter space
where the infimum is taken over all paths such that and .
2.2 The Fisher-Rao metric
The beta distributions are part of an exponential family and so the general term of the Fisher-Rao metric depends on second order derivatives of the underlying potential function. Denoting by the matrix form of ,
where is the potential function
describes the metric tensor and Proposition2 the geodesic equations.
The matrix representation of the Fisher-Rao metric on the space of beta distributions is given by
where denotes the digamma function, i.e. .
This follows from straightforward computations. ∎
The geodesic equations are given by
The geodesic equations are given by
where the ’s denote the Christoffel symbols of the second kind. These can be obtained from the Christoffel symbols of the first kind and the coefficients of the inverse of the metric matrix
Here we have used the Einstein summation convention. Since the Fisher metric is a Hessian metric, the Christoffel symbols of the first kind can be obtained as
where is the potential function (1). Straightforward computation yields the desired equations. ∎
Notice that when , both geodesic equations (2
) yield a unique ordinary differential equation
The line of equation is therefore a geodesic for the Fisher metric. More precisely, we have the following corollary obtained directly from Proposition 2.
The line of equation , where
is a geodesic for the Fisher metric.
2.3 Some properties of the polygamma functions
In order to further study the geometry of the beta manifold, we will need a few technical results on the polygamma functions. The polygamma functions are the successive derivatives of the logarithm of the Euler Gamma function , i.e.
Their series representation is given by:
In the sequel, we are mostly interested by the first three, i.e.
and we will use the following equivalents in the neighborhood of zero, given by the first term of their series
In the neighborhood of infinity, we will need the following expansions
2.4 Curvature of the Fisher-Rao metric
In this section, we prove our main result, that is that the sectional curvature of the beta manifold is negative.
The sectional curvature of the Fisher metric is given by:
The sectional curvature of a Hessian metric is given by
Computing the partial derivatives of the potential function gives
and the determinant of the metric is given by
Factorizing the numerator by yields the desired result. ∎
The asymptotic behavior of the sectional curvature is given by
Moreover, we have the following limits
Let us fix , and denote the varying parameter of the beta distribution. The asymptotic behavior of the sectional curvature can be obtained by separately examining its numerator and the metric determinant appearing at the denominator
Using a first order Taylor development of in and the equivalent (3), we deduce the following expansion for the determinant around zero
Similarly, writing the numerator of the sectional curvature as
we get the following behavior around zero
This yields the desired expression for the limit of as . Now, in the neighborhood of infinity, the expansions (4) yield the following behavior for the determinant
while an expansion of the numerator gives
yielding again the desired limit for . Finally, approximating by when , we get
which completes the proof. ∎
We can now show the following property.
The sectional curvature is negative and bounded from below.
Recall that in its most factorized form, the sectional curvature is given by
Since is negative, the first factor is negative and so there remains to prove that the function is sub-additive, i.e.
This has been shown recently in  (Corollary 4). Now, to show that it is bounded from below, set
and are continuous functions on , and according to Proposition 4 they have finite limits at the boundaries
Therefore, they are bounded, i.e., there exist negative finite constants and such that for all ,
Setting , notice that is a continuous function on due to the continuity of in both its variables and the invertibility of the limit and infimum. For this last reason, we also obtain
i.e., has finite limits at the boundaries and is therefore bounded, in particular from below
The fact that the beta manifold has negative curvature is particularly interesting for the computation of Riemannian centroids such as Fréchet or Karcher means [6, 7]. In general, the Fréchet mean on a Riemannian manifold is not unique. However, when the curvature is negative, there is no cut locus and uniqueness holds. In this context, it is defined for any given sequence of probability measure as
This quantity can be computed using a gradient descent algorithm the Karcher flow algorithm.
2.5 A lower bound on the determinant of the metric
The determinant of the metric is the key ingredient to volume computations. In this section, a lower bound of this determinant is computed, which is also its asymptotic value.
The determinant of the information metric matrix admits the following integral representation:
The polygamma function of order can be expressed as an integral  :
The determinant expands as:
Using the integral 6, it comes:
The difference is thus equal to the laplace transform at of the function:
Using the convolution theorem , it comes:
The same procedure can be applied to the integral expression of to obtain:
Building on the integral representation of Proposition 6, it is possible to derive a lower bound for the determinant, which is also its asymptotic value.
The following lower bound holds:
The hyperbolic cotangent satisfies:
The integral expression 12 is rewritten as:
a lower bound for 17 is thus given by:
The inner term:
admits a closed form expression:
Performing the outer integration yields finally:
thus completing the proof. ∎
2.6 A geometric view point of the central limit theorem
The central limit theorem tells us that once re-centered, a beta distribution converges at rate
to a centered normal distribution
For a fixed , the line corresponds to all the beta distributions of mean . Asymptotically, we retrieve a hyperbolic distance between two distributions on this line.
When for a fixed , the metric is asymptotically
The infinitesimal element of length is given by
and so when for a fixed ,
When , we have asymptotically using (4)