I Introduction
In information geometry, a parameterized family of probability distributions is expressed as a manifold in the Riemannian space [1], in which the parameters form the coordinate system on manifold and the distance measure is given by the Fisher information matrix (FIM) [2]. This framework reduces certain important informationtheoretic problems to investigations of different Riemannian manifolds [3]. This perspective is helpful in analyzing many problems in engineering and sciences where probability distributions are used, including optimization [4], signal processing [5]
[6], optimal transport [7], and quantum information [8].In particular, when the separation between the two points on the manifold is defined by KullbackLeibler divergence (KLD) or relative entropy between two probability distributions and on a finite state space , i.e.,
(1) 
then the resulting Riemmanian metric is defined by FIM [9]. This method of defining a Riemannian metric on statistical manifolds from a general divergence function is due to Eguchi [10]. Since FIM is the inverse of the wellknown deterministic CramérRao lower bound (CRLB), the informationgeometric results are directly connected with those of estimation theory. Further, the relative entropy is related to the Shannon entropy by , where
is the uniform distribution on
.It is, therefore, instructive to explore informationgeometric frameworks for key estimationtheoretic results. For example, the Bayesian CRLB [11, 12]
is the analogous lower bound to CRLB for random variables. It assumes the parameters to be random with an
a prioriprobability density function. In [13], we derived Bayesian CRLB using a general definition of KLD when the probability densities are not normalized.Recently, [14] studies information geometry of Rényi entropy [15], which is a generalization of Shannon entropy. In source coding problem where normalized cumulants of compressed lengths are considered instead of expected compressed lengths, Rényi entropy is used as a measure of uncertainty [16]. The Rényi entropy of of order , , , is defined to be . In the context of source distribution version of this problem, the Rényi analog of relative entropy is relative entropy [17, 18]. The relative entropy of with respect to (or Sundaresan’s divergence between and ) is defined as
(2) 
It follows that, as , we have and [19]. Rényi entropy and relative entropy are related by the equation . Relative entropy is closely related to the Csiszár divergence as
(3) 
where , , and [19, Sec. II]. The measures and are called escort or scaled measures [20, 21]. It is easy to show that, indeed, the right side of (3) is the Rényi divergence between and of order .
The Rényi entropy and relative entropy arise in several important informationtheoretic problems such as guessing [22, 18, 23] and task encoding [24]. Relative
entropy arises in statistics as a generalized likelihood function robust to outliers
[25], [26]. It also shares many interesting properties with relative entropy; see, e.g. [19, Sec. II] for a summary. For example, relative entropy behaves like squared Euclidean distance and satisfies a Pythagorean property in a similar way relative entropy does [19, 13]. This property helps in establishing a computation method [26] for a robust estimation procedure [27].Motivated by such analogous relationships, our previous works [14] investigated the relative entropy from a differential geometric perspective. In particular, we applied Eguchi’s method with relative entropy as the divergence function to obtain the resulting statistical manifold with a general Riemannian metric. This metric is specified by the Fisher information matrix that is the inverse of the so called deterministic CRLB [19]. In this paper, we study the structure of statistical manifolds with respect to a relative entropy in a Bayesian setting. This is a nontrivial extension of our work in [13], where we proposed Riemmanian metric arising from the relative entropy for the Bayesian case. In the process, we derive a general Bayesian CramérRao inequality and the resulting Bayesian CRLB which embed the compounded effects of both Rényi order and Bayesian prior distribution. We show that, in limiting cases, the bound reduces to deterministic CRLB (in the absence of prior), Bayesian CRLB (when ) or CRLB (no priors and ).
The rest of the paper is organized as follows. In the next section, we provide the essential background to information geometry. We then introduce the definition of Bayesian relative entropy in Section III and show that it is a valid divergence function. In Section IV, we establish the connection between this divergence and the Riemannian metric and then derive the Bayesian version of CramérRao inequality in Section V. Finally, we state our main result for the Bayesian CRLB in Section VI and conclude in Section VII.
Ii Desiderata for Information Geometry
A dimensional manifold is a Hausdorff and second countable topological space which is locally homeomorphic to Euclidean space of dimension [2]. A Riemannian manifold is a real differentiable manifold in which the tangent space at each point is a finite dimensional Hilbert space and, therefore, equipped with an inner product. The collection of all these inner products is Riemannian metric. In information geometry, the statistical models play the role of a manifold and the Fisher information matrix and its various generalizations play the role of a Riemannian metric. The statistical manifold here means a parametric family of probability distributions with a continuously varying parameter space (statistical model). The dimension of a statistical manifold is the dimension of the parameter space. For example, is a two dimensional statistical manifold. The tangent space at a point of is a linear space that corresponds to a “local linearization” at that point. The tangent space at a point of is denoted by . The elements of are called tangent vectors of at . A Riemannian metric at point of
is an inner product defined for any pair of tangent vectors of
at .Let us restrict to statistical manifolds defined on a finite set . Let denote the space of all probability distributions on . Let be a submanifold. Let be a parameterization of . By a divergence, we mean a nonnegative function defined on such that iff . Given a divergence function on , Eguchi [28] defines a Riemannian metric on by the matrix
where
where is the elements in the th row and th column of the matrix , , , and dual affine connections and , with connection coefficients described by following Christoffel symbols
and
such that, and form a dualistic structure in the sense that
(4) 
where .
Iii Relative entropy in the Bayesian Setting
We now introduce relative entropy in the Bayesian case. Define as a dimensional submanifold of and
(5) 
where is a probability distribution on . Then, is a submanifold of . Let . The relative entropy of with respect to is (c.f. [29, Eq. (2.4)] and [13])
We define relative entropy of with respect to by
We present the following Lemma 1 which shows that our definition of Bayesian relative entropy is not only a valid divergence function but also coincides with the KLD as .
Lemma 1.

with equality if and only if

as .
Proof:
1) Let . Applying Holder’s inequality with Holder conjugates and , we have
where denotes norm. When , the inequality is reversed. Hence
where the second inequality follows because, for ,
and hence
The conditions of equality follow from the same in Holder’s inequality and .
2) This follows by applying L’Hôpital rule to the first term of :
and since Renyi entropy coincides with Shannon entropy as . ∎
Iv Fisher Information Matrix for the Bayesian Case
The Eguchi’s theory we provided in section II can also be extended to the space of all positive measures on , that is, . Following Eguchi [28], we define a Riemannian metric on by
(6)  
(7) 
where
(8) 
and
(9) 
Let , and . Notice that, when , becomes , the usual Fisher information matrix in the Bayesian case [c.f. [13]].
V An Version of CramérRao Inequality in the Bayesian Setting
We now investigate the geometry of with respect to the metric . Later, we formulate an equivalent version of the CramérRao inequality associated with a submanifold . Observe that is a subset of , where . The tangent space at every point of is . That is, . We denote a tangent vector (that is, elements of ) by . The manifold can be recognized by its homeomorphic image under the mapping . Under this mapping the tangent vector can be represented which is defined by and we define
(10) 
Motivated by the expression for the Riemannian metric in (IV), define
We shall call the above an representation of at . With this notation, the is given by
It should be noted that . This follows since
When , the right hand side of (V) reduces to .
Motivated by (V), the representation of a tangent vector at is
(12)  
where the last equality follows because . The collection of all such representations is
(13) 
Clearly . Also, since any with is
with where
In view of (10), we have
(14) 
Now the inner product between any two tangent vectors defined by the information metric in (IV) is
(15) 
Consider now an dimensional statistical manifold , a submanifold of , together with the metric as in (15). Let be the dual space (cotangent space) of the tangent space and let us consider for each , the element which maps to . The correspondence is a linear map between and . An inner product and a norm on are naturally inherited from by
and
Now, for a (smooth) real function on , the differential of at , , is a member of which maps to . The gradient of at p is the tangent vector corresponding to , hence, satisfies
(16) 
and
(17) 
Since is a tangent vector,
(18) 
for some scalars . Applying (16) with , for each , and using (18), we obtain
This yields
and so
(19) 
From (16), (17), and (19), we get
(20) 
where is the th entry of the inverse of .
With these preliminaries, we now state our main results. These are analogous to those in [30, Sec. 2.5].
Theorem 2.
Let be any mapping (that is, a vector in . Let be the mapping . We then have
(21) 
Proof.
For any tangent vector ,
(22)  
(23) 
Since (c.f. (14)), there exists such that , and . Hence we see that
Comments
There are no comments yet.