# Some Information Inequalities for Statistical Inference

In this paper, we first describe the generalized notion of Cramer-Rao lower bound obtained by Naudts (2004) using two families of probability density functions, the original model and an escort model. We reinterpret the results in Naudts (2004) from a statistical point of view and obtain some interesting examples in which this bound is attained. Further we obtain information inequalities which generalize the classical Bhattacharyya bounds in both regular and non-regular cases.

There are no comments yet.

## Authors

• 1 publication
• 1 publication
• ### Hybrid and Generalized Bayesian Cramér-Rao Inequalities via Information Geometry

Information geometry is the study of statistical models from a Riemannia...
04/02/2021 ∙ by Kumar Vijay Mishra, et al. ∙ 0

• ### Asymmetry of copulas arising from shock models

When choosing the right copula for our data a key point is to distinguis...
08/29/2018 ∙ by Damjana Kokol Bukovšek, et al. ∙ 0

• ### Inference Functions for Semiparametric Models

The paper discusses inference techniques for semiparametric models based...
11/14/2020 ∙ by Rodrigo Labouriau, et al. ∙ 0

• ### Wasserstein information matrix

We study the information matrix for statistical models by L^2-Wasserstei...
10/24/2019 ∙ by Wuchen Li, et al. ∙ 1

• ### Decomposable Families of Itemsets

The problem of selecting a small, yet high quality subset of patterns fr...
06/16/2020 ∙ by Nikolaj Tatti, et al. ∙ 0

• ### An approximate KLD based experimental design for models with intractable likelihoods

Data collection is a critical step in statistical inference and data sci...
04/01/2020 ∙ by Ziqiao Ao, et al. ∙ 0

• ### On the mode and median of the generalized hyperbolic and related distributions

Except for certain parameter values, a closed form formula for the mode ...
02/05/2020 ∙ by Robert E. Gaunt, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

For every unbiased estimator

, an inequality of the type

 Varθ(T)≥d(θ) (1)

for every in the parameter space , is called an information inequality and it plays an important role in parameter estimation. The early works of Cramer (1946) and Rao (1945) introduced the Cramer-Rao inequality for regular density functions. For the non-regular density functions, Hammersley (1950) and Chapman-Robbins (1951) introduced an inequality which come to be known as Hammersley-Chapman-Robbins inequality while Fraser and Guttman (1952) obtained the Bhattacharyya bounds. Later Vincze (1979) and Khatri (1980) introduced information inequalities by imposing the regularity assumptions on a prior distribution rather than on the model.

Recently in statistical physics, a generalized notion of Fisher information and a corresponding Cramer-Rao lower bound are introduced by Naudts (2004) using two families of probability density functions, the original model and an escort model. Further he showed that in the case of a deformed exponential family of probability density functions, there exist an escort family and an estimator whose variance attains the bound. Also from an information geometric point of view, he obtained a dually flat structure of the deformed exponential family.

In this article, concentrating on the statistical aspects of Naudts’s paper we define several information inequalities which generalize the classical Hammersley-Chapman-Robbins bound and Bhattacharyya Bounds in both regular and non-regular cases. This is done by imposing the regularity conditions on the escort model rather than on the original model.

In Section 2, some preliminary results are stated. Section 3 describes the generalized Cramer-Rao lower bound obtained by Naudts (2004) reinterpreted from a statistical point of view and applied to many examples. Also we obtain many interesting examples in which the bound is optimal. In Section 4, we obtain a generalized notion of Bhattacharyya bounds in both regular and non-regular cases. We conclude with Discussions in Section 5.

## 2 Preliminaries

Let

be a random vector with probability density function

, where and takes values in . To estimate a real valued function of , define a class of estimators as

 Cφ={S(X)∣Ef\lx@converttounder¯θ(S(X))=φ(\lx@converttounder¯θ),∀\lx@converttounder¯θ∈Θ}. (2)

Define

 Uf = {U(X)∣Ef\lx@converttounder¯θ(U)=0;Ef\lx@converttounder¯θ(U2)<∞,∀\lx@converttounder¯θ∈Θ} (3)

Let .
Let . Let

 Ef\lx@converttounder¯θ(TSi)=λi(\lx@converttounder¯θ);i=1,⋯,m (4)

where is a real valued function of .
Define

 ψ(x,\lx@converttounder¯θ)=m∑i=1αiSi(x,\lx@converttounder¯θ),αi∈R (5)

For any estimators ,

 Covfθ(T,ψ)=Covfθ(S,ψ)=δ(θ)sinceT−S∈Uf,ψ∈Ψ (6)

Therefore , the Cauchy-Schwarz inequality

 Varf\lx@converttounder¯θ(T(x))≥(Covf\lx@converttounder¯θ(T,ψ))2Varf\lx@converttounder¯θ(ψ)=δ(θ)2Varf\lx@converttounder¯θ(ψ). (7)

gives a lower bound for the variance of all unbiased estimators of .
Now consider

 Varf\lx@converttounder¯θ(ψ) = Varf\lx@converttounder¯θ(m∑i=1αiSi)=α⊺Σα (8) (Covf\lx@converttounder¯θ(T,ψ))2 = (m∑i=1αiλi(\lx@converttounder¯θ))2=α⊺MM⊺α (9)

where , is the covariance matrix of and .
Note that both and depends on . But for the convenience of writing, we suppress the index .
Equation (7) becomes

 Varf\lx@converttounder¯θ(T(x))≥α⊺MM⊺αα⊺Σα∀α∈Rm (10)

which implies

 Varf\lx@converttounder¯θ(T(x))≥supαα⊺MM⊺αα⊺Σα=M⊺Σ−1M (11)

where is the inverse of the covariance matrix .
For later use, we state the following well known theorem as

###### Proposition 2.1

Information Inequality. Let be a random vector with probability density function (pdf) , where . Consider an estimator , and the functions with

 Ef\lx@converttounder¯θ(TSi)=λi(\lx@converttounder¯θ);i=1,⋯,m (12)

Then the variance of satisfies the inequality

 Varf\lx@converttounder¯θ(T(x))≥M⊺Σ−1M (13)

where and is the inverse of the covariance matrix . The equality in (13) holds iff

 S⊺Σ−1M=a(\lx@converttounder¯θ)(T(x)−φ(\lx@converttounder¯θ)) (14)

for some function and

## 3 Generalized Cramer-Rao Type Lower Bound

Naudts (2004) introduced a generalized notion of Fisher information by replacing the original model by an escort model at suitable places. Using this, he obtained a generalized Cramer-Rao lower bound. To study the statistical implications of this generalization, first we reinterpret Naudts’s generalized as follows.
Let be any density function parametrized by . Define

 Ug = {U(X)∣Eg\lx@converttounder¯θ(U)=0;Eg\lx@converttounder¯θ(U2)<∞∀\lx@converttounder¯θ∈Θ} (15)

Let us make the following assumptions,

1. [label=()]

2. The probability measure is absolutely continuous with respect to the probability measure 16

3. . 17

###### Remark 1

If is a complete statistic, then clearly .

Naudts (2004) defined a generalized Fisher information as

 Nij(\lx@converttounder¯θ)=∫∂ig(x,\lx@converttounder¯θ)∂jg(x,\lx@converttounder¯θ)1f(x,\lx@converttounder¯θ)dx;∂i:=∂∂θiandi,j=1,⋯,p (18)

Note that when , reduces to the Fisher information .

###### Theorem 3.1

Let be a random vector with pdf . Let be a pdf satisfying (1) (2). Assume that

1. [label=()]

2. exists for all and , where 19

3. and is non-singular. 20

4. partial derivatives of functions of expressed as integrals with respect to can be obtained by differentiating under the integral sign. 21

Then for , the variance of satisfies

 Varf\lx@converttounder¯θ(T(X))≥M⊺N−1(\lx@converttounder¯θ)M (22)

where and .

• From Proposition 2.1, choose functions

 Si=∂ig(x,\lx@converttounder¯θ)f(x,\lx@converttounder¯θ),i=1,⋯,p (23)

It is easy to see that , where . Applying Proposition 2.1, the bound in Equation (22) is obtained. The fact that ensures that the bound is same for all unbiased estimators of .

Now we give some of the interesting examples in which the Naudt’s generalized Cramer-Rao bound is optimal.

###### Example 1

Suppose

are independent uniform random variables in

, where . Then has a pdf

 f(x,θ)=nxn−1θn−1,x≥θ (24)

Now consider an unbiased estimator of . Then

 Varfθ(T) = θ2n(n+2) (25)

Consider a pdf as

 g(x,θ)=n(n+1)(1−xθ)xn−1θn. (26)

Using Remark 1, clearly . Now

 Egθ[T]=λ(θ)=(n+1)θn+2andN(θ)=n(n+1)2(n+2)θ2 (27)

The lower bound in Equation (22) is obtained as

 (λ′(θ))2N(θ)=θ2n(n+2)=Varfθ(T) (28)

Thus the estimator is an unbiased estimator of which attains the generalized Cramer Rao bound by Naudts. When , this example reduces to Example 1 given in Naudts (2004). Note that in this case, does not attain the Hammersley-Chapman-Robbins lower bound.

###### Example 2

Suppose are independent random variables,

 Y1,⋯,Yn∼exp(−(y−θ)),y≥θ,θ>0. (29)

Then the random variable has a pdf

 f(x,θ)=nexp(−n(x−θ)),x≥θ (30)

Now consider an unbiased estimator of . Then

 Varfθ(T) = 1n2 (31)

Then the pdf which optimizes the bound in Equation (22) is

 g(x,θ)=n2(x−θ)exp(−n(x−θ)),x≥θ (32)

Using Remark 1, clearly . Note that and the bound in Equation (22) is obtained as

 (λ′(θ))2N(θ)=1n2=Varfθ(T) (33)
###### Example 3

Location family
Let and be two density functions on satisfying (1) (2). Now let be a random variable with density function and . Let . Let be an unbiased estimator for . Let . Then from Equation (14), the optimality condition for the bound in Equation (22) is given by

 ∂θg(x,θ)f(x,θ)=a(θ)(T(x)−φ(θ)) (34)

for some function . In this case

 ∂θg(x,θ)=−g′(x−θ) (35)

where denote the derivative of with respect to . Then (34) becomes

 g′(x−θ)=a(θ)(φ(θ)−T(x))f(x,θ) (36)

Let and , then

 g(x) = a(0)(φ(0)∫xx0f(x)dx−∫xx0T(x)f(x)dx) (37) = a(0)h(x) (38)

where

 h(x)=φ(0)∫xx0f(x)dx−∫xx0T(x)f(x)dx<∞ (39)

can be computed since are given.
Now can be solved from the normalization condition as

 a(0)=1∫D′h(x)dxif∫D′h(x)dx<∞ (40)

Thus the optimizing family is obtained.

###### Example 4

Scale family
Let and be two density functions on satisfying (1) (2). Now let

 X∼f(x,θ)=1θf(xθ)x∈D⊆R,θ>0 (41)

and

 g(x,θ)=1θg(xθ) (42)

Let be an unbiased estimator for . Let . Then from (14),

 ∂θg(x,θ)f(x,θ)=a(θ)(T(x)−φ(θ)) (43)

for some function .

 −xθ3g′(x/θ)−1θ2g(x/θ)=a(θ)(T(x)−φ(θ))f(x,θ) (44)

where denotes the derivative of function with respect to .
Let . Then we have

 xg′(x)+g(x)=a(1)(φ(1)−T(x))f(x) (45)

Let . Integrating the above equation from to , we get

 xg(x)−x0g(x0) = a(1)∫xx0(φ(1)−T(x))f(x)dx (46) = a(1)(h(x)−h(x0)) (47)

where

 h(x)−h(x0)=∫xx0(φ(1)−T(x))f(x)dx (48)

Thus we get

 xg(x)=a(1)h(x)⇒g(x)=a(1)k(x) (49)

for some function .
Now can be solved from the normalization condition of the function as

 a(1)=1∫D′k(x)dxif∫D′k(x)dx<∞ (50)

Thus the optimizing family is obtained.

###### Example 5

Suppose are independent uniform random variables in , where . Then

 X=max{Y1,⋯,Yn}∼f(x,θ)=nxn−1θn−1 (51)

Now consider an unbiased estimator for , where . Then

 Varfθ(T) = k2θ2kn(n+2k)2 (52)

Now define a pdf as

 g(x,θ)=n(n+k)(1−xkθk)xn−1kθn (53)

Using Remark 1, clearly . Then the bound in Equation (22) is obtained as

 (λ′(θ))2N(θ)=k2θ2kn(n+2k)2=Varfθ(T) (54)

Thus the estimator is an unbiased estimator of which attains the bound in Equation (22).

###### Example 6

Let

be the Gamma distribution with a scale parameter

and a known shape parameter ,

 f(x,θ)=1Γ(α)xα−1e−x/θθα (55)

Let , where is an integer such that and . Then is an unbiased estimator of with .

 Varfθ(T) = [Γ(α)Γ(2k+α)(Γ(α+k))2−1]θ2k (56)

Consider a pdf such that attains the bound in Equation (22) as follows.
For ,

 g(x,θ)=1ce−x/θθ[k−1∑i=0si(xθ)α+k−(i+2)],c=k−1∑i=0siΓ(α+k−(i+1)) (57)

where and .
For and ,

 g(x,θ)=1ce−x/θθ[k1∑i=1si(xθ)α−(i+1)],c=k1∑i=0siΓ(α−i) (58)

where , and .
For ,

 g(x,θ)=1Γ(α−1)xα−2e−x/θθα−1 (59)

This is an interesting special case as does not attain the Bhattacharyya bounds of any order while it attains the bound in Equation (22).

###### Example 7

Consider the Normal distribution

given by

 f(x,θ)=1√2πθe−x22θ2,x∈Randθ>0 (60)

Consider an unbiased estimator for . Then . Consider a pdf

 g(x,θ)=1√2πθ(34+x24θ2)e−x22θ2 (61)

Note that

 N(θ)=6θ2andλ(θ)=Egθ(T)=2θ4 (62)

Thus the bound in Equation (22) is obtained as

 (λ′(θ))2N(θ)=32θ83=Varfθ(T) (63)

Thus attains Naudts’s bound with optimizing family . Note that belongs to exponential family and is a second degree polynomial in the canonical statistic . Hence it attains the Bhattacharyya bound of order . Thus the ‘first order’ bound obtained using is equal to the second order Bhattacharyya bound.

###### Example 8

Poisson distribution
Let are i.i.d random variables from Poisson distribution

 f(x,θ)=θxe−θx!,x=0,1,⋯andθ>0. (64)

Consider the joint pdf

 f(x1,⋯,xn,θ)=θn¯xe−nθx1!⋯xn!,where¯x=x1+⋯+xnn. (65)

Consider an unbiased estimator for . attains the bound in Equation (22) if we choose the pdf

 g(x1,⋯,xn,θ)=12θn¯xe−nθx1!⋯xn!+¯x2θn¯x−1e−nθx1!⋯xn!. (66)

Note that attains the Bhattacharyya bound of order while it attains ‘first order’ Naudts’s bound.

###### Example 9

Let are i.i.d uniform random variables in , where . Then the joint pdf is

 f(x1,⋯,xn,θ)=1θnΠni=11{0≤xi≤θ} (67)

where denotes the indicator function.
Note that is a sufficient statistic with and attains the bound in Equation (22) if we choose the pdf

 g(x1,⋯,xn,θ)=n+1θn(1−tθ);0≤t≤θwheret=max{x1,⋯,xn}. (68)

Note that can be written as

 g(x1,⋯,xx,θ)=Z(n+1θn−(n+1)tθn+1−1) (69)

where the is a function defined by , with and is the inverse function of .
Such family is called a deformed exponential family with a deformed logarithm function and deformed exponential function (refer Naudts(2004) for more details). From the Proposition 5.2, Naudts (2004), it can be easily seen that is the -escort distribution so that the variance of the sufficient statistic attains the Naudts’s bound.

###### Remark 2

Deformed exponential family is a generalization of exponential family in which the deformed logarithm of the density function is a linear function of the statistic . In exponential family the statistic is sufficient and complete under some conditions. As in exponential family, is sufficient in deformed exponential family also. For statistical applications, the definition of deformed exponential family should include the requirement that is a complete statistic.
In the above example, is a deformed exponential family while this is not the case in most of the other examples. However, attains the bound given by Naudts (2004).

## 4 Generalized Bhattacharyya Bounds

In this section, we obtain an information inequality which generalizes the Bhattacharyya bound given by Fraser and Guttman (1952). This is defined using the divided difference of a density function satisfying the conditions (1) (2). We begin by recalling the definition of the divided difference formula.

### 4.1 One parameter case

###### Definition 1

Let be a scalar function of . Let be a positive integer. Let us define the divided difference of the function at nodes . We have data points,

 (θ0,h(θ0)),⋯,(θk