In information geometry, a parameterized family of probability distributions is expressed as a manifold in the Riemannian space , in which the parameters form the coordinate system on manifold and the distance measure is given by the Fisher information matrix (FIM) . This framework reduces certain important information-theoretic problems to investigations of different Riemannian manifolds . This perspective is helpful in analyzing many problems in engineering and sciences where probability distributions are used, including optimization , signal processing 6], optimal transport , and quantum information .
In particular, when the separation between the two points on the manifold is defined by Kullback-Leibler divergence (KLD) or relative entropy between two probability distributions and on a finite state space , i.e.,
then the resulting Riemmanian metric is defined by FIM . This method of defining a Riemannian metric on statistical manifolds from a general divergence function is due to Eguchi . Since FIM is the inverse of the well-known deterministic Cramér-Rao lower bound (CRLB), the information-geometric results are directly connected with those of estimation theory. Further, the relative entropy is related to the Shannon entropy by , where
is the uniform distribution on.
is the analogous lower bound to CRLB for random variables. It assumes the parameters to be random with ana prioriprobability density function. In , we derived Bayesian CRLB using a general definition of KLD when the probability densities are not normalized.
Recently,  studies information geometry of Rényi entropy , which is a generalization of Shannon entropy. In source coding problem where normalized cumulants of compressed lengths are considered instead of expected compressed lengths, Rényi entropy is used as a measure of uncertainty . The Rényi entropy of of order , , , is defined to be . In the context of source distribution version of this problem, the Rényi analog of relative entropy is relative -entropy [17, 18]. The relative -entropy of with respect to (or Sundaresan’s divergence between and ) is defined as
It follows that, as , we have and . Rényi entropy and relative -entropy are related by the equation . Relative -entropy is closely related to the Csiszár -divergence as
-entropy arises in statistics as a generalized likelihood function robust to outliers, . It also shares many interesting properties with relative entropy; see, e.g. [19, Sec. II] for a summary. For example, relative -entropy behaves like squared Euclidean distance and satisfies a Pythagorean property in a similar way relative entropy does [19, 13]. This property helps in establishing a computation method  for a robust estimation procedure .
Motivated by such analogous relationships, our previous works  investigated the relative -entropy from a differential geometric perspective. In particular, we applied Eguchi’s method with relative entropy as the divergence function to obtain the resulting statistical manifold with a general Riemannian metric. This metric is specified by the Fisher information matrix that is the inverse of the so called deterministic -CRLB . In this paper, we study the structure of statistical manifolds with respect to a relative -entropy in a Bayesian setting. This is a non-trivial extension of our work in , where we proposed Riemmanian metric arising from the relative entropy for the Bayesian case. In the process, we derive a general Bayesian Cramér-Rao inequality and the resulting Bayesian -CRLB which embed the compounded effects of both Rényi order and Bayesian prior distribution. We show that, in limiting cases, the bound reduces to deterministic -CRLB (in the absence of prior), Bayesian CRLB (when ) or CRLB (no priors and ).
The rest of the paper is organized as follows. In the next section, we provide the essential background to information geometry. We then introduce the definition of Bayesian relative -entropy in Section III and show that it is a valid divergence function. In Section IV, we establish the connection between this divergence and the Riemannian metric and then derive the Bayesian -version of Cramér-Rao inequality in Section V. Finally, we state our main result for the Bayesian -CRLB in Section VI and conclude in Section VII.
Ii Desiderata for Information Geometry
A -dimensional manifold is a Hausdorff and second countable topological space which is locally homeomorphic to Euclidean space of dimension . A Riemannian manifold is a real differentiable manifold in which the tangent space at each point is a finite dimensional Hilbert space and, therefore, equipped with an inner product. The collection of all these inner products is Riemannian metric. In information geometry, the statistical models play the role of a manifold and the Fisher information matrix and its various generalizations play the role of a Riemannian metric. The statistical manifold here means a parametric family of probability distributions with a continuously varying parameter space (statistical model). The dimension of a statistical manifold is the dimension of the parameter space. For example, is a two dimensional statistical manifold. The tangent space at a point of is a linear space that corresponds to a “local linearization” at that point. The tangent space at a point of is denoted by . The elements of are called tangent vectors of at . A Riemannian metric at point of
is an inner product defined for any pair of tangent vectors ofat .
Let us restrict to statistical manifolds defined on a finite set . Let denote the space of all probability distributions on . Let be a sub-manifold. Let be a parameterization of . By a divergence, we mean a non-negative function defined on such that iff . Given a divergence function on , Eguchi  defines a Riemannian metric on by the matrix
where is the elements in the th row and th column of the matrix , , , and dual affine connections and , with connection coefficients described by following Christoffel symbols
such that, and form a dualistic structure in the sense that
Iii Relative -entropy in the Bayesian Setting
We now introduce relative -entropy in the Bayesian case. Define as a -dimensional sub-manifold of and
We define relative -entropy of with respect to by
We present the following Lemma 1 which shows that our definition of Bayesian relative -entropy is not only a valid divergence function but also coincides with the KLD as .
with equality if and only if
1) Let . Applying Holder’s inequality with Holder conjugates and , we have
where denotes -norm. When , the inequality is reversed. Hence
where the second inequality follows because, for ,
The conditions of equality follow from the same in Holder’s inequality and .
2) This follows by applying L’Hôpital rule to the first term of :
and since Renyi entropy coincides with Shannon entropy as . ∎
Iv Fisher Information Matrix for the Bayesian Case
Let , and . Notice that, when , becomes , the usual Fisher information matrix in the Bayesian case [c.f. ].
V An -Version of Cramér-Rao Inequality in the Bayesian Setting
We now investigate the geometry of with respect to the metric . Later, we formulate an -equivalent version of the Cramér-Rao inequality associated with a submanifold . Observe that is a subset of , where . The tangent space at every point of is . That is, . We denote a tangent vector (that is, elements of ) by . The manifold can be recognized by its homeomorphic image under the mapping . Under this mapping the tangent vector can be represented which is defined by and we define
Motivated by the expression for the Riemannian metric in (IV), define
We shall call the above an -representation of at . With this notation, the is given by
It should be noted that . This follows since
When , the right hand side of (V) reduces to .
Motivated by (V), the -representation of a tangent vector at is
where the last equality follows because . The collection of all such -representations is
Clearly . Also, since any with is
In view of (10), we have
Now the inner product between any two tangent vectors defined by the -information metric in (IV) is
Consider now an -dimensional statistical manifold , a submanifold of , together with the metric as in (15). Let be the dual space (cotangent space) of the tangent space and let us consider for each , the element which maps to . The correspondence is a linear map between and . An inner product and a norm on are naturally inherited from by
Now, for a (smooth) real function on , the differential of at , , is a member of which maps to . The gradient of at p is the tangent vector corresponding to , hence, satisfies
Since is a tangent vector,
where is the th entry of the inverse of .
With these preliminaries, we now state our main results. These are analogous to those in [30, Sec. 2.5].
Let be any mapping (that is, a vector in . Let be the mapping . We then have
For any tangent vector ,
Since (c.f. (14)), there exists such that , and . Hence we see that