1 Introduction
Fisher Information (FI) is normally thought of as related to the performance of an estimator of a parameter in a statistical model for the given data. The ordinary FI is related through the Cramer-Rao Bound (CRB) to the variance of an unbiased estimator of the parameter. what we will call the Bayesian FI (BFI) is related by the van Tress inequality to the Ensemble Mean Squared Error (EMSE) for an estimator. On the other hand we have shown previously that the FI is related to the performance of the ideal observer on the task of detecting a small change in the parameter, as measured by the area under the Receiver Operating Characteristic (ROC) curve. Thus FI is related to performance on a binary classification task. The Ziv-Zakai inequality relates the EMSE of an estimator to the performance of the ideal observer on the task of distinguishing two different values of the parameter as measured by the minimum probability of error (MPE). Thus EMSE is also related to performance on a binary classification task. In this work we will derive further connections between performance of estimators the performance of the ideal observer on related detection tasks. Specifically we will show how Shannon Information (SI) for a certain binary classification task is related to the BFI. We have previously shown that this SI is related via an integral transform to the MPE on the same task. We then have a circle of relations starting with MPE and EMSE via Ziv-Zakai, then EMSE and BFI via van Trees, BFI and SI via the work here, and finally SI and MPE via the aforementioned integral transform.
In Section 2 we have a brief review of FI and the CRB. Section 3 contains a previously derived result relating the area ubder the ideal observer ROC curve to FI when the task is to detect a small change in the parameter. The reason for showing this result here is to emphasize the similarity with the results below relating SI with BFI. In Section 4 we define the BFI and briefly discuss the van Trees inequality. The Ziv-Zakai inequality is presented in Section 5, and we have a short calculation that derives a more compact form that is easily related to other results here. Section 6 presents a derivation of the main result which relates the conditional entropy for the task of detecting a small change in a parameter to the BFI. In Section 7 we show the vector-parameter version of the main result. The main result is reformulatd in terms of SI in Section 8 and our conclusions are stated in Section 9.
2 Fisher information
We will consider a scalar parameter throughout. This parameter takes values on the real line. There are also vector parameter versions of everything. The data is a random vector
generated by a noisy imaging system, and has a conditional probability density function (PDF) denoted by
. The Fisher Information (FI) is then given by [1,2](1) |
Suppose that is an estimator that uses the data vector to produce an estimate of . The Cramer-Rao Bound (CRB) states that
(2) |
when
(3) |
In other words the variance of an unbiased estimator is bounded below by the reciprocal of the FI. For the Bayesian FI there is the van Trees inequality which is similar to the Cramer-Rao lower bound. This will be discussed further below.
3 FI and AUC
We consider a classification task to classify
as a sample drawn from from or , corresponding to hypotheses and respectively. The ideal-observer AUC for this task computes the likelihood ratio [3](4) |
and compare to a threshold in order to make the decision. If the likelihood ratio is greater than the threshold, then the decision is made that was drawn from from . If the likelihood ratio is less than or equal to the threshold, then the decision is made that was drawn from from . The True Positive Fraction (TPF) is , the probability that the decision is made when is true. The False Positive Fraction (FPF) is , the probability that the decision is made when is true. The Receiver Operating Characterisitc (ROC) curve for this task plots TPF versus FPF as the threshold is varied from to . This is a concave curve that starts at the point and ends at . The area under the ideal-observer ROC curve, the AUC, is a commonly used figure of merit (FOM) for an imaging system on a classification task. In this case we will denote this quantity by . For the ideal observer the AUC is always between and . This area is related to the detectability by the equation
(5) |
The detectability is an equivalent FOM to the AUC and varies from to . In a previous publication we showed that that the Taylor series expansion for the ideal-observer detectability in this situation is given by [4,5,6]
(6) |
Our goal in this work is to derive a similar relation between the Bayesian FI and the average Shannon Information (SI) for the classification task that we have defined here.
4 Bayesian FI and EMSE
For the rest of this paper we will assume that we have a prior probability
on the parameter of interest. The Bayesian FI is given by(7) |
For an alternate expression of the Bayesian FI we define the posterior PDF on by
(8) |
where
(9) |
In terms of the posterior PDF we can show that
(10) |
The Ensemble Mean Squared Error (EMSE) for any estimator of is defined by
(11) |
The van Trees inequality states that [7,8]
(12) |
In other words, the EMSE for any estimator of the parameter of interest is bounded below by the reciprocal of the Bayesian FI. The van Tress inequality is also called the Bayesian CRB. This is the reason for calling the Bayesian FI. This inequality, much like the CRB, relates the relevant version of the FI to performance on an estimation task. On the other hand, the result described in Section 2 relates the FI to a classification task, namely the detection of a small change in the parameter of interest. We want to connect the Bayesian FI to this same task, but before we do this we discuss a relation between the EMSE and the detection task in Section 2 .
5 Ziv-Zakai inequality (EMSE and MPE)
When the prior is known, then we can define prior probabilities for the two hypotheses and for the detection task in Section 2 via
(13) |
and
(14) |
Then the minimum probabilty of error on the this detection task is the probability of error for the ideal observer when the threshold used for the likelihood ratio is given by
(15) |
The minimum probability of error is since is the False Negative Fraction (FNF) . Of course, in this expression, and also depend on the pair . The Ziv-Zakai inequality, in its standard form, states that [9,10]
(16) |
We will show that this inequality can be written in an alternate form which connects it to the classification task in Section 2. We start with the substitution so that we have
(17) |
Due to the range of integration for we may also write this as
(18) |
Interchanging the order of integration we have
(19) |
Now we note that since the two tasks described by the pairs and are the same task. Therefore we may interchange the variables and and use the symmetry of all of the functions in the integrand to arrive at
(20) |
Now combining the second and fourth inequalities in this chain we have
(21) |
We separate the right hand side into two integrals and again use the symmetry in and to find out that the two integrals are the same. Therefore we have the final result
(22) |
Finally, the integral on the right can be written as an expectation:
(23) |
This formulation of the Ziv-Zakai inequality makes it clear that the worst case scenario, from the EMSE persepctive, is when the minimum proabablity of error on the classification task in Section 2 is significant for relatively large values of . In the next section we will complete the circle by relating the Bayesian FI to performance on the classification task in Section 2 as measured by the average SI.
6 SI and the Bsyesian FIM
For the purposes of this section we will use the notation
(24) |
for the likelihood ratio for the classification task in Section 2. Keeping in mind that and , we can write the SI for this classification task as [11]
(25) |
If we introduce a binary random variable
such that with probability , and with probability , then the entropy of is given by(26) |
The conditional entropy for given is then defined by The non-standard notation for conditional entropy here was introduced in an earlier publication that related conditional entropy to MPE via an integral transform in the variable . After some algebra we have the expression
We are interested in the limiting value of the conditional entropy as approaches . After some more simplification involving the definition of the likelihood ratio we can write the conditioal entropy for the task in Section 2 in the form
Note that only expectations under hypothesis are needed to compute the conditional entropy, and hence the SI.
In order to keep the notational complications to a minimum we introduce the function
We also will use the following notation
(27) |
where the prime on the far right indicates a derivative with respect to . Similarly we will write
(28) |
Now, using the fact that, when , we have and ,
Now we use the fact that to give us
This result is to be expected since, as a function of , the conditional entropy is maximized when .
Now we will look at second derivative terms. We define
We also will use the following notation
(29) |
and
(30) |
In order to ease the mental computations for the reader we note that, for any variable that depends on , we have
and
Also note that we use
and
Now using the Liebniz rule for second derivatives we have the expansion where the terms on the right are given by
and
These three terms simplify algebraically to
and
Now we use the fact that and to arrive at
If we use the prior to average over we then have
Since we have the Taylor expansion of the average conditional entropy for the task in Section 2:
Thus the Bayesian FI gives the lowest order approximation for the average conditional entropy when the task is detecting a small change in the parameter of interest.
The Ziv-Zakai inequality provides a relation between the MPE for the task of using the data vector to detect a change in a parameter in the conditional PDF , and the EMSE for the task of using to estimate a parameter in the conditional PDF . The van Trees inequality relates this EMSE the Bayesian FI. We have now found a relationship between the Bayesian FI and the average conditional entropy for the detection task. This completes the circle since there is an integral relation between the conditionaal entropy for any classification task and the MPE for the same task [12,13]. This reinforces the idea that there is a strong mathematical relationship between the performance of an imaging system on estimation tasks and the performance of the same system on the detection of a small change in the parameter of interest.
7 Vector version
For athe vector version of this last result we assume a vector parameter and a perturbation in this parameter of the form , where is a unit vector in parameter space. We then have a Taylor series expansion in the varaible given by
in this equation the Bayesian Fisher Information Matrix (FIM) is given by
(31) |
where the ordinary FIM is defined by
(32) |
There are vector versions of the van Trees inequality and the Ziv-Zakai inequality so the circle of relations between EMSE, MPE and Bayesian FIM also exists for vector parameters..
8 Taylor series expansion for SI
To get a Taylor series expansion for SI similar to the one we have derived for conditional entropy we must find the expansion for . This function can be written in the form
Using notation similar to that in Section 5 we have
For the second derivative we can write , with
and
Combining terms we have
Now we may combine these computations with the expansion in Section 5 and find that
This gives us another relation beween the FI and the detection task in Section 2. The Taylor series expansion for now has the form
Thus the SI and the ideal-observer detectability are essentially the same FOM when we are trying to detect a small change in a parameter.
After using the prior on to compute an expectation we have
Thus the quantity also has an interpretation in terms of the average SI for the detection task in Section 2.
9 Conclusion
Fisher Information (FI) is almost always thought of in terms of the cramer-Rao Bound. What we call the Bayesian Fisher Information is usually thought of in terms of the van Trees inequality. We have shown previously that Fisher Information is related to the performance of the ideal observer on the task of detecting a small change in the parameter, as measured by the area under the Receiver Operating Characteristic curve. Thus Fisher Information is related to the ideal performance on a binary classification task. The Ziv-Zakai inequality relates the Ensemble Mean Squared Error of an estimator to the performance of the ideal observer on the task of distinguishing two different values of the parameter, as measured by the Minimum Probability of Error. In this work we showed how Shannon Information for a certain binary classification task is related to the Bayesian Fisher Information. Since we have previously shown that this Shannon Information is related via an integral transform to the Minimum Probability of Error on the same task, we then have a circle of relations relating Minimum Probability of Error on a detection tsak, Ensemble Mean Squared Error of an Estimator, Bayesian Fisher Information, and Shannon Information for the detection task.
One of the useful results of this work is that Tsak-Based Shannon Information (TSI) for the detection of a small change in a parameter and the ideal-observer AUC for the same task at essentially equivalent figures of merit for a given imaging system. Optimizing asystem for one of these quantities will also optimoze it for the other. Another point of interest is that, when the parameter is a vector, the interpretation of the Bayesian Fisher Information Matrix in terms of Shannon Information does not require inversion of the matrix. This is similar to the connection betwee the Fisher Information Matrix and the ideal-observer AUC derived previously. Finally, both of these relations are the first terms iof a Taylor series expansion, and not inequalities as the Cramer-Rap Bound and the van Trees inequality are. They thus have the potential to provide good approximations to the relevant TSI or AUC values.
References
- [1] J. Shao, Mathematical Statistics, Springer, New York (1999).
- [2] H. H. Barrett, K. J. Myers, Foundations of Image Science, John Wiley & Sons, Hoboken, NJ (2004).
- [3] H. Barrett, C. Abbey, E. Clarkson, "Objective assessment of image quality. III. ROC metrics, ideal observers, and likelihood-generating functions," J. Opt. Soc. Am. A 15, 1520-1535 (1998).
- [4] E. Clarkson , F. Shen, “Fisher information and surrogate figures of merit for the task-based assessment of image quality,” JOSA A 27, 2313-2326 (2010) .
- [5] F. Shen, E. Clarkson, “Using Fisher information to approximate ideal observer performance on detection tasks for lumpy-background images,” JOSA A 23, 2406-2414 (2006).
- [6] E. Clarkson, "Asymptotic ideal observers and surrogate figures of merit for signal detection with list-mode data," J. Opt. Soc. Am. A 29, 2204-2216 (2012).
- [7] H. L. van Trees, Detection, Estimation and Modulation Theory, Part 1, New York, Wiley, (1968).
- [8] R.D. Gill, B.Y. Levit, “Applications of the van Trees inequality: a Bayesian Cramér–Rao bound, “ Bernoulli 1, 59–79 (1995).
- [9] J. Ziv, M. Zakai, "Some Lower Bounds on Signal Parameter," IEEE Trans. on Information Theory 15, 386-391 (1969).
- [10] K. Bell, Y. Steinberg, Y. Ephraim, and H. Van Trees, “Extended Ziv-Zakai lower bound for vector parameter estimation,” IEEE Trans. on Information Theory 43, 624– 637 (1997).
- [11] T. M. Cover, J. A. Thomas, Elements of Information Theory, John Wiley & Sons, Hoboken, NJ (2006).
- [12] E. Clarkson, J. Cushing, "Shannon information and ROC analysis in imaging," J. Opt. Soc. Am. A 32, 1288-1301 (2015).
- [13] E. Clarkson, J. Cushing, "Shannon information and receiver operating characteristic analysis for multiclass classification in imaging," J. Opt. Soc. Am. A 33, 930-937 (2016).