DeepAI

The Relation Between Bayesian Fisher Information and Shannon Information for Detecting a Change in a Parameter

We derive a connection between performance of estimators the performance of the ideal observer on related detection tasks. Specifically we show how Shannon Information for the task of detecting a change in a parameter is related to the Fisher Information and the Bayesian Fisher Information. We have previously shown that this Shannon Information is related via an integral transform to the Minimum Probability of Error on the same task. We then outline a circle of relations starting with this Minimum Probability of Error and Ensemble Mean Squared Error via the Ziv-Zakai inequality, then the Ensemble Mean Squared error for an estimator and the Bayesian Fisher Information via the van Trees Inequality, and finally the Bayesian Fisher Information and the Shannon Information for a detection task via the work here.

01/31/2019

Probability of Error for Detecting a Change in a Parameter, Total Variation of the Posterior Distribution, and Bayesian Fisher Information

The van Trees inequality relates the Ensemble Mean Squared Error of an e...
07/01/2021

How many samples are needed to reliably approximate the best linear estimator for a linear inverse problem?

The linear minimum mean squared error (LMMSE) estimator is the best line...
07/10/2018

Understanding VAEs in Fisher-Shannon Plane

In information theory, Fisher information and Shannon information (entro...
04/29/2019

Properties of discrete Fisher information: Cramer-Rao-type and log-Sobolev-type inequalities

The Fisher information have connections with the standard deviation and ...
05/07/2020

Nonparametric Estimation of the Fisher Information and Its Applications

This paper considers the problem of estimation of the Fisher information...
07/16/2020

Information Theoretic Limits of Cardinality Estimation: Fisher Meets Shannon

In this paper we study the intrinsic tradeoff between the space complexi...
12/05/2019

Advanced analysis of temporal data using Fisher-Shannon information: theoretical development and application in geosciences

Complex non-linear time series are ubiquitous in geosciences. Quantifyin...

1 Introduction

Fisher Information (FI) is normally thought of as related to the performance of an estimator of a parameter in a statistical model for the given data. The ordinary FI is related through the Cramer-Rao Bound (CRB) to the variance of an unbiased estimator of the parameter. what we will call the Bayesian FI (BFI) is related by the van Tress inequality to the Ensemble Mean Squared Error (EMSE) for an estimator. On the other hand we have shown previously that the FI is related to the performance of the ideal observer on the task of detecting a small change in the parameter, as measured by the area under the Receiver Operating Characteristic (ROC) curve. Thus FI is related to performance on a binary classification task. The Ziv-Zakai inequality relates the EMSE of an estimator to the performance of the ideal observer on the task of distinguishing two different values of the parameter as measured by the minimum probability of error (MPE). Thus EMSE is also related to performance on a binary classification task. In this work we will derive further connections between performance of estimators the performance of the ideal observer on related detection tasks. Specifically we will show how Shannon Information (SI) for a certain binary classification task is related to the BFI. We have previously shown that this SI is related via an integral transform to the MPE on the same task. We then have a circle of relations starting with MPE and EMSE via Ziv-Zakai, then EMSE and BFI via van Trees, BFI and SI via the work here, and finally SI and MPE via the aforementioned integral transform.

In Section 2 we have a brief review of FI and the CRB. Section 3 contains a previously derived result relating the area ubder the ideal observer ROC curve to FI when the task is to detect a small change in the parameter. The reason for showing this result here is to emphasize the similarity with the results below relating SI with BFI. In Section 4 we define the BFI and briefly discuss the van Trees inequality. The Ziv-Zakai inequality is presented in Section 5, and we have a short calculation that derives a more compact form that is easily related to other results here. Section 6 presents a derivation of the main result which relates the conditional entropy for the task of detecting a small change in a parameter to the BFI. In Section 7 we show the vector-parameter version of the main result. The main result is reformulatd in terms of SI in Section 8 and our conclusions are stated in Section 9.

2 Fisher information

We will consider a scalar parameter throughout. This parameter takes values on the real line. There are also vector parameter versions of everything. The data is a random vector

generated by a noisy imaging system, and has a conditional probability density function (PDF) denoted by

. The Fisher Information (FI) is then given by [1,2]

 F(θ)=⟨[ddθlnpr(g|θ)]2⟩g|θ. (1)

Suppose that is an estimator that uses the data vector to produce an estimate of . The Cramer-Rao Bound (CRB) states that

 var(^θ)≥[F(θ)]−1 (2)

when

 ⟨^θ(g)⟩g|θ=θ. (3)

In other words the variance of an unbiased estimator is bounded below by the reciprocal of the FI. For the Bayesian FI there is the van Trees inequality which is similar to the Cramer-Rao lower bound. This will be discussed further below.

3 FI and AUC

We consider a classification task to classify

as a sample drawn from from or , corresponding to hypotheses and respectively. The ideal-observer AUC for this task computes the likelihood ratio [3]

 Λ(g)=pr(g|~θ)pr(g|θ) (4)

and compare to a threshold in order to make the decision. If the likelihood ratio is greater than the threshold, then the decision is made that was drawn from from . If the likelihood ratio is less than or equal to the threshold, then the decision is made that was drawn from from . The True Positive Fraction (TPF) is , the probability that the decision is made when is true. The False Positive Fraction (FPF) is , the probability that the decision is made when is true. The Receiver Operating Characterisitc (ROC) curve for this task plots TPF versus FPF as the threshold is varied from to . This is a concave curve that starts at the point and ends at . The area under the ideal-observer ROC curve, the AUC, is a commonly used figure of merit (FOM) for an imaging system on a classification task. In this case we will denote this quantity by . For the ideal observer the AUC is always between and . This area is related to the detectability by the equation

 (5)

The detectability is an equivalent FOM to the AUC and varies from to . In a previous publication we showed that that the Taylor series expansion for the ideal-observer detectability in this situation is given by [4,5,6]

 d2(θ,θ+△θ)=(△θ)2F(θ)+… (6)

Our goal in this work is to derive a similar relation between the Bayesian FI and the average Shannon Information (SI) for the classification task that we have defined here.

4 Bayesian FI and EMSE

For the rest of this paper we will assume that we have a prior probability

on the parameter of interest. The Bayesian FI is given by

 F=⟨F(θ)⟩θ+⟨[ddθlnpr(θ)]2⟩θ. (7)

For an alternate expression of the Bayesian FI we define the posterior PDF on by

 (8)

where

 pr(g)=∫∞−∞pr(g|θ)pr(θ)dθ. (9)

In terms of the posterior PDF we can show that

 F=⟨⟨[ddθlnpr(θ|g)]2⟩g|θ⟩θ. (10)

The Ensemble Mean Squared Error (EMSE) for any estimator of is defined by

 EMSE=⟨⟨[^θ(g)−θ]2⟩g|θ⟩θ (11)

The van Trees inequality states that [7,8]

 EMSE≥F−1. (12)

In other words, the EMSE for any estimator of the parameter of interest is bounded below by the reciprocal of the Bayesian FI. The van Tress inequality is also called the Bayesian CRB. This is the reason for calling the Bayesian FI. This inequality, much like the CRB, relates the relevant version of the FI to performance on an estimation task. On the other hand, the result described in Section 2 relates the FI to a classification task, namely the detection of a small change in the parameter of interest. We want to connect the Bayesian FI to this same task, but before we do this we discuss a relation between the EMSE and the detection task in Section 2 .

5 Ziv-Zakai inequality (EMSE and MPE)

When the prior is known, then we can define prior probabilities for the two hypotheses and for the detection task in Section 2 via

 Pr0(θ,~θ)=pr(θ)pr(θ)+pr(~θ) (13)

and

 Pr1(θ,~θ)=pr(~θ)pr(θ)+pr(~θ). (14)

Then the minimum probabilty of error on the this detection task is the probability of error for the ideal observer when the threshold used for the likelihood ratio is given by

 (15)

The minimum probability of error is since is the False Negative Fraction (FNF) . Of course, in this expression, and also depend on the pair . The Ziv-Zakai inequality, in its standard form, states that [9,10]

 EMSE≥12∫∞0∫∞−∞[pr(θ)+pr(θ+x)]Pe(θ,θ+x)dθxdx (16)

We will show that this inequality can be written in an alternate form which connects it to the classification task in Section 2. We start with the substitution so that we have

 EMSE≥12∫∞−∞∫∞θ[pr(θ)+pr(~θ)]Pe(θ,~θ)(~θ−θ)d~θdθ. (17)

Due to the range of integration for we may also write this as

 EMSE≥12∫∞−∞∫∞θ[pr(θ)+pr(~θ)]Pe(θ,~θ)∣∣~θ−θ∣∣d~θdθ (18)

Interchanging the order of integration we have

 (19)

Now we note that since the two tasks described by the pairs and are the same task. Therefore we may interchange the variables and and use the symmetry of all of the functions in the integrand to arrive at

 (20)

Now combining the second and fourth inequalities in this chain we have

 (21)

We separate the right hand side into two integrals and again use the symmetry in and to find out that the two integrals are the same. Therefore we have the final result

 EMSE≥12∫∞−∞∫∞−∞pr(θ)Pe(θ,~θ)∣∣~θ−θ∣∣d~θdθ. (22)

Finally, the integral on the right can be written as an expectation:

 EMSE≥12⟨∫∞−∞Pe(θ,~θ)∣∣~θ−θ∣∣d~θ⟩θ. (23)

This formulation of the Ziv-Zakai inequality makes it clear that the worst case scenario, from the EMSE persepctive, is when the minimum proabablity of error on the classification task in Section 2 is significant for relatively large values of . In the next section we will complete the circle by relating the Bayesian FI to performance on the classification task in Section 2 as measured by the average SI.

6 SI and the Bsyesian FIM

For the purposes of this section we will use the notation

 Λ(g|θ,~θ)=pr(g|~θ)pr(g|θ) (24)

for the likelihood ratio for the classification task in Section 2. Keeping in mind that and , we can write the SI for this classification task as [11]

 (25)

If we introduce a binary random variable

such that with probability , and with probability , then the entropy of is given by

 H(y)=−11+yln(11+y)−y1+yln(y1+y). (26)

The conditional entropy for given is then defined by The non-standard notation for conditional entropy here was introduced in an earlier publication that related conditional entropy to MPE via an integral transform in the variable . After some algebra we have the expression

 Ce(y)=11+y⟨ln(Λ+yΛ)⟩g|~θ+y1+y⟨ln(Λ+yy)⟩g|θ.

We are interested in the limiting value of the conditional entropy as approaches . After some more simplification involving the definition of the likelihood ratio we can write the conditioal entropy for the task in Section 2 in the form

Note that only expectations under hypothesis are needed to compute the conditional entropy, and hence the SI.

In order to keep the notational complications to a minimum we introduce the function

 C′e(θ)=dd~θCe(θ,~θ)∣∣∣~θ=θ.

We also will use the following notation

 Λ′=Λ′(g|θ)=dd~θΛ(g|θ,~θ)∣∣∣~θ=θ=pr′(g|θ)pr(g|θ), (27)

where the prime on the far right indicates a derivative with respect to . Similarly we will write

 (28)

Now, using the fact that, when , we have and ,

Now we use the fact that to give us

This result is to be expected since, as a function of , the conditional entropy is maximized when .

Now we will look at second derivative terms. We define

 C′′e(θ)=d2d~θ2Ce(θ,~θ)∣∣∣~θ=θ.

We also will use the following notation

 Λ′′=Λ′′(g|θ)=d2d~θ2Λ(g|θ,~θ)∣∣∣~θ=θ=pr′′(g|θ)pr(g|θ), (29)

and

 y′′=y′′(θ)=d2d~θ2y(θ,~θ)∣∣ ∣∣~θ=θ=pr′′(θ)pr(θ). (30)

In order to ease the mental computations for the reader we note that, for any variable that depends on , we have

 ddθzlnz=z′(1+lnz)

and

 d2dθ2zlnz=z′′(1+lnz)+(z′)2z.

Also note that we use

 ddθ11+y=−y′(1+y)2

and

 d2dθ211+y=2(y′)2(1+y)3−y′′(1+y)2.

Now using the Liebniz rule for second derivatives we have the expansion where the terms on the right are given by

 A=⟨[(y′)2−y′′4](2ln2)⟩g|θ,
 2B=⟨−y′2[(1+ln2)(Λ′+y′)−Λ′−y′]⟩g|θ

and

These three terms simplify algebraically to

 A=⟨[(y′)2−y′′]ln22⟩g|θ,
 2B=⟨−ln22[Λ′y′+(y′)2]⟩g|θ

and

 C=⟨(Λ′′+y′′)ln22+12Λ′y′−14(Λ′)2−14(y′)2⟩g|θ.

Now we use the fact that and to arrive at

If we use the prior to average over we then have

Since we have the Taylor expansion of the average conditional entropy for the task in Section 2:

 ⟨Ce(θ,θ+△θ)⟩θ=ln2−(△θ)24F+…

Thus the Bayesian FI gives the lowest order approximation for the average conditional entropy when the task is detecting a small change in the parameter of interest.

The Ziv-Zakai inequality provides a relation between the MPE for the task of using the data vector to detect a change in a parameter in the conditional PDF , and the EMSE for the task of using to estimate a parameter in the conditional PDF . The van Trees inequality relates this EMSE the Bayesian FI. We have now found a relationship between the Bayesian FI and the average conditional entropy for the detection task. This completes the circle since there is an integral relation between the conditionaal entropy for any classification task and the MPE for the same task [12,13]. This reinforces the idea that there is a strong mathematical relationship between the performance of an imaging system on estimation tasks and the performance of the same system on the detection of a small change in the parameter of interest.

7 Vector version

For athe vector version of this last result we assume a vector parameter and a perturbation in this parameter of the form , where is a unit vector in parameter space. We then have a Taylor series expansion in the varaible given by

 ⟨Ce(θ,θ+tu)⟩θ=ln2−t24u†Fu+…

in this equation the Bayesian Fisher Information Matrix (FIM) is given by

 (31)

where the ordinary FIM is defined by

 F(θ)=⟨[∇θlnpr(g|θ)][∇θlnpr(g|θ)]†⟩g|θ. (32)

There are vector versions of the van Trees inequality and the Ziv-Zakai inequality so the circle of relations between EMSE, MPE and Bayesian FIM also exists for vector parameters..

8 Taylor series expansion for SI

To get a Taylor series expansion for SI similar to the one we have derived for conditional entropy we must find the expansion for . This function can be written in the form

 H(y)=11+y[(1+y)ln(1+y)−yln(y)].

Using notation similar to that in Section 5 we have

 H′(θ)=−y′4(2ln2)+12(y′ln2+y′−y′)=0.

For the second derivative we can write , with

 2b=−2(y′4)(y′ln2+y′−y′)

and

 c=(12){y′′[1+ln2]+(y′)22−y′′−(y′)2}.

Combining terms we have

 H′′(θ)=−(y′)24.

Now we may combine these computations with the expansion in Section 5 and find that

 I′′(θ)=14⟨(Λ′)2⟩g|θ=14F(θ)

This gives us another relation beween the FI and the detection task in Section 2. The Taylor series expansion for now has the form

 I(θ,θ+△θ)=(△θ)24F(θ)+…

Thus the SI and the ideal-observer detectability are essentially the same FOM when we are trying to detect a small change in a parameter.

After using the prior on to compute an expectation we have

 ⟨I(θ,θ+△θ)⟩θ=(△θ)24⟨F(θ)⟩θ+…

Thus the quantity also has an interpretation in terms of the average SI for the detection task in Section 2.

9 Conclusion

Fisher Information (FI) is almost always thought of in terms of the cramer-Rao Bound. What we call the Bayesian Fisher Information is usually thought of in terms of the van Trees inequality. We have shown previously that Fisher Information is related to the performance of the ideal observer on the task of detecting a small change in the parameter, as measured by the area under the Receiver Operating Characteristic curve. Thus Fisher Information is related to the ideal performance on a binary classification task. The Ziv-Zakai inequality relates the Ensemble Mean Squared Error of an estimator to the performance of the ideal observer on the task of distinguishing two different values of the parameter, as measured by the Minimum Probability of Error. In this work we showed how Shannon Information for a certain binary classification task is related to the Bayesian Fisher Information. Since we have previously shown that this Shannon Information is related via an integral transform to the Minimum Probability of Error on the same task, we then have a circle of relations relating Minimum Probability of Error on a detection tsak, Ensemble Mean Squared Error of an Estimator, Bayesian Fisher Information, and Shannon Information for the detection task.

One of the useful results of this work is that Tsak-Based Shannon Information (TSI) for the detection of a small change in a parameter and the ideal-observer AUC for the same task at essentially equivalent figures of merit for a given imaging system. Optimizing asystem for one of these quantities will also optimoze it for the other. Another point of interest is that, when the parameter is a vector, the interpretation of the Bayesian Fisher Information Matrix in terms of Shannon Information does not require inversion of the matrix. This is similar to the connection betwee the Fisher Information Matrix and the ideal-observer AUC derived previously. Finally, both of these relations are the first terms iof a Taylor series expansion, and not inequalities as the Cramer-Rap Bound and the van Trees inequality are. They thus have the potential to provide good approximations to the relevant TSI or AUC values.

References

• [1] J. Shao, Mathematical Statistics, Springer, New York (1999).
• [2] H. H. Barrett, K. J. Myers, Foundations of Image Science, John Wiley & Sons, Hoboken, NJ (2004).
• [3] H. Barrett, C. Abbey, E. Clarkson, "Objective assessment of image quality. III. ROC metrics, ideal observers, and likelihood-generating functions," J. Opt. Soc. Am. A 15, 1520-1535 (1998).
• [4] E. Clarkson , F. Shen, “Fisher information and surrogate figures of merit for the task-based assessment of image quality,” JOSA A 27, 2313-2326 (2010) .
• [5] F. Shen, E. Clarkson, “Using Fisher information to approximate ideal observer performance on detection tasks for lumpy-background images,” JOSA A 23, 2406-2414 (2006).
• [6] E. Clarkson, "Asymptotic ideal observers and surrogate figures of merit for signal detection with list-mode data," J. Opt. Soc. Am. A 29, 2204-2216 (2012).
• [7] H. L. van Trees, Detection, Estimation and Modulation Theory, Part 1, New York, Wiley, (1968).
• [8] R.D. Gill, B.Y. Levit, “Applications of the van Trees inequality: a Bayesian Cramér–Rao bound, “ Bernoulli 1, 59–79 (1995).
• [9] J. Ziv, M. Zakai, "Some Lower Bounds on Signal Parameter," IEEE Trans. on Information Theory 15, 386-391 (1969).
• [10] K. Bell, Y. Steinberg, Y. Ephraim, and H. Van Trees, “Extended Ziv-Zakai lower bound for vector parameter estimation,” IEEE Trans. on Information Theory 43, 624– 637 (1997).
• [11] T. M. Cover, J. A. Thomas, Elements of Information Theory, John Wiley & Sons, Hoboken, NJ (2006).
• [12] E. Clarkson, J. Cushing, "Shannon information and ROC analysis in imaging," J. Opt. Soc. Am. A 32, 1288-1301 (2015).
• [13] E. Clarkson, J. Cushing, "Shannon information and receiver operating characteristic analysis for multiclass classification in imaging," J. Opt. Soc. Am. A 33, 930-937 (2016).