The confidence concept arises in statistics as follows: there is a statistical model for data a marginal parameter of interest where (with the same notation used here for the function and its range), a desired confidence level and the goal is to state a region such that for every
While there can be different motivations for reporting such a region, the one considered here is that there is an estimateof the parameter of interest such that and the ”size” of together with the confidence serve as an assessment of the accuracy of the recorded estimate. It is well-known that confidence regions can sometimes give absurd answers as discussed, for example, in Plante (2020). By absurd here is meant that could be the null set or all of with positive probability and so be uninformative. In such situations it is difficult to see how reporting can be regarded as a valid assessment of the accuracy of Another issue associated with confidence regions is that there isn’t a theory that prescribes how such a region can be constructed for a general problem.
The problem of error assessment via quoting a region , can also be approached by adding a prior to the problem and providing a Bayesian credible region having posterior content at least The Bayesian approach has the virtues of the error assessment being based on the observed data and such a region can always be constructed, say via the hpd (highest posterior density) principle. There are criticisms that can be leveled at this approach, however, as there is no assessment of the reliability of the inference which is implicit in the frequentist approach via repeated sampling. While the use of a prior is also sometimes criticized, the position taken here is that this is no different than the use of a statistical model as, while the model can be checked for its agreement with the observed data via model checking, similarly a prior can be submitted to a check for prior-data conflict, see Evans and Moshonov (2006), Evans (2015) and Nott et al. (2020). There is also the issue of bias which is interpreted here as meaning that the ingredients to the analysis, namely, the data collection procedure together with the model and prior, can be chosen in such a fashion as to produce a foregone conclusion. That such bias is possible is illustrated in Evans (2015) and Evans and Guo (2021) where also a solution to this issue is developed.
Rather than invoke something like the hpd principle to construct a credible region, the approach taken here is somewhat different. This is based on the principle of evidence: there is evidence in favor of a valueif the posterior probability has decreased and there is no evidence either way if they are equal. This simple principle has broad implications not the least of which being that it makes little sense to allow any reported region to include a value for which there is evidence against it being true. In fact, a reported confidence or credible region can contain values for which there is evidence against being the true value. As such, it is more appropriate to quote what is called the plausible region, namely, those values of for which there is evidence in favor of being true, see Evans (2015) and Section 2. The principle of evidence also leads to a direct method for measuring and controlling bias which comes in two forms for this problem. Here the implausible region refers to the set of values for which evidence against is obtained.
(i) Bias against refers to the prior probability that the plausible region does not contain the true value.
(ii) Bias in favor refers to the prior probability that the implausible region does not contain a meaningfully false value as defined in Section 2.
As discussed in Evans and Guo (2021), the control of bias is equivalent to the a priori control of frequentist coverage probabilities for the plausible region. Controlling bias against is equivalent to setting the confidence of the plausible region, namely, the probability of it containing the true value. Controlling bias in favor is typically equivalent to setting the accuracy of the plausible region where accuracy refers to the probability of the plausible region covering false values. The measurement of bias is reviewed in Section 2.
The end result of this approach is the best of both approaches to the problem, namely, a Bayesian region with a particular posterior content that reflects the uncertainty in the observed data, together with a guaranteed frequentist confidence and accuracy, that reflects the reliability of the inference. The reliability of an inference refers to the extent to which an inference is trustworthy and, in general, Bayesian inferences do not address this issue. It is important to note that these results hold for any proper prior and, at least up to computational difficulties, can always be implemented. In particular, there is no need to search for a prior that will provide an appropriate confidence. So an elicited prior can be used, and moreover there is no need for the posterior content and the confidence to agree, as they refer to different aspects of the inference.
Section 2 discusses some necessary background and establishes the new result that a plausible region is never absurd. Section 3 applies this approach to several well-known problems where the construction of frequentist confidence regions has proven to be at the very least difficult and, one could argue, for which there is no current satisfactory solution. The methodology is general and can be applied to any problem with a Bayesian formulation using proper priors and so this provides a degree of unification between Bayes and frequentism.
2 Relative Belief Inferences and Bias
If the prior and posterior densities of are denoted by and , then the relative belief ratio of at is given by and there is evidence in favor of when evidence against when and no evidence either way when This follows from the principle of evidence when the prior distribution of is discrete and follows via a limiting argument in the general case, see Evans (2015). Actually, for much of the discussion here, any valid measure of evidence can be used instead of the relative belief ratio, where valid
means there is a cut-off that determines evidence against versus evidence in favor according to the principle of evidence. For example, a Bayes factor is a valid measure of evidence also using the value 1 as the cut-off. As will be seen, the plausible region and the measures of bias are independent of the valid measure of evidence used so this is not an issue for the discussion here.
The set of values for which there is evidence in favor is the plausible region When the values are ordered by the amount of evidence via the relative belief ratio, the natural estimate of is given by The posterior content of measures how strongly it is believed that the true value is in and the prior probability content of gives a measure of the size or accuracy based on the observed data. So can be considered a highly accurate estimate when the posterior probability is high, as then there is a high degree of belief the true value is in and the prior probability is small, as then this set is small relative to the prior. Other measures of posterior accuracy can also be quoted, such as the Euclidean measure or cardinality measure of when relevant, but the posterior and prior contents work universally for this purpose. Note that any other estimate determined in this way from a valid measure of evidence will also lie in and so produces no gain in accuracy over
It is possible, however, that there is bias in Bayesian inferences. For example, suppose that the goal is to assess the hypothesis The relative belief ratio indicates whether there is evidence in favor of or against and there are several approaches to measuring the strength of this evidence but this is not considered further here, see Evans (2015). Suppose that evidence against is obtained but that there is a large prior probability of not getting evidence in favor even when is true, namely, the probability
is large where denotes the conditional prior distribution of the data given that is true. It seems reasonable then to treat the finding of evidence against as unreliable and it can be said that there is an a priori bias against Similarly, using a metric on if evidence in favor of is obtained but
is large, namely, there is a large prior probability of not obtaining evidence against when it is meaningfully false, as indicated by the choice of the deviation then it is said that there is bias in favor of Note that generally decreases as moves away from so it is often only necessary to consider values of satisfying to determine the bias against. The value of is not arbitrary but is determined by the application, as it represents the accuracy to which it is desired to know the true value of which also determines the precision of the measurement process that produces the data. Clearly there is some similarity between the frequentist size and power of a test and the bias against and bias in favor here but there is no suggestion that we are to accept or reject The purpose of the biases is to measure the reliability of what the evidence in the observed data tells us about
The probability measures depend on the prior only through the conditional prior and do not depend on the marginal prior for the parameter of interest. As such the probabilities determined by are essentially frequentist in nature and similar to the use of distributions on parameters in mixed models, namely, is used to integrate out nuisance parameters. In fact, the bias probabilities (1) and (2) are exactly frequentist but for the model given by where is the density of and this corresponds to the original model when
The average bias against a value of can be written as
So (3) is determined by the prior coverage probability of the plausible region which will be referred to hereafter as a (Bayesian) confidence as it is the prior probability that contains the true value. Note that if an upper bound can be obtained for as a function of then 1 minus this bound serves as a lower bound on the confidence and, as will be seen, such a bound is commonly available. This lower bound is then a confidence with respect to the model Also the average bias in favor can be written as
which is the prior probability that a meaningfully false value is not in the implausible region the set of values for which there is evidence against. In cases where the prior distribution of is continuous, then typically (4) is an upper bound on the prior probability of covering a meaningfully false value.
While it might be appealing to consider choosing the prior to make both these biases small, this is the wrong approach as indeed experience indicates that choosing a prior to minimize bias against simply increases bias in favor and conversely. As discussed in Evans and Guo (2021), as the diffuseness of the prior increases, typically bias in favor increases and bias against decreases. The way to control these biases is, as established in Evans (2015), through the amount of data collected as both biases converge to 0 as this increases. As such, it is possible to control both the prior probability of covering the true value and the prior probability of it covering a meaningfully false value and so obtain a Bayesian inference with good frequentist properties. Of course, this is similar to the use of coverage probabilities in frequentist inference but the reported inferences are indeed Bayesian while the biases are concerned with ensuring that the inferences are reliable from a frequentist perspective.
A region for is called absurd if it is possible that or with positive probability The following result establishes that plausible regions can never be absurd in realistic statistical contexts. The result can be viewed as a logical consistency result for this approach to assessing the error in an estimate. For this let denote the prior predictive density associated with the corresponding measure be the conditional prior predictive density of the data given and put
where the last equality follows from the Savage-Dickey ratio result, namely, Note that, the conditional prior distribution of the data given has no dependence on the parameter of interest and, except in extraordinary circumstances, this set will have prior probability 0, namely, . For, if then nothing can be learned as there is no evidence in either direction for any value of
Theorem 1. The plausible region for (i) never satisfies and (ii) satisfies with prior probability 0 when
Proof: (i) Suppose that This is true iff for every and so
which is a contradiction. (ii) Now suppose which is true iff for every Since this implies that for any the set has which implies Then
which is a contradiction.
It is also possible to construct credible regions based on the relative belief ratio as in where as then As with all relative belief inferences, the relative belief credible regions are invariant under smooth reparameterizations while hpd regions are not. This means that the computation of a -relative belief region can be carried out in any parameterization while each parameterization leads to a potentially different hpd credible region. With both approaches, however, it is impossible to say a priori that all the elements of the region will have evidence in their favor. For relative belief regions, however, it is guaranteed that for any then and there is evidence in favor of each element of so such a region can be also be reported. There are also a variety of optimality properties satisfied by relative belief credible regions, see Evans (2015). The property of importance for the discussion here, however, is that for the plausible region it can be determined a priori how much data to collect to ensure appropriate coverage probabilities and that doesn’t seem to be available for a credible region in general.
It is also the case, as established in Evans and Guo (2021), that plausible regions possess additional good, and even optimal, properties beyond those already cited like parameterization invariance and no dependence on the valid measure of evidence used. For example, the prior probability of covering the true value is always greater than or equal to the prior probability of covering a false value which in frequentist theory is known as the unbiasedness property for confidence regions. As an example of an optimal property, when the prior is continuous, then among all regions satisfying for every namely, the conditional prior probability that covers the true value is as large as this probability for then maximizes the prior probability of not covering a false value and there is a similar optimality property for the discrete case. The implication of this is that, if one considers another way of expressing evidence that leads to the region then provided its coverage probabilities are as large as those of as otherwise it presumably wouldn’t be considered, then cannot do better than with respect to accuracy. This is really an optimality property for the principle of evidence and there are other such results.
There are a variety of problems discussed in the literature where issues concerning either absurd confidence regions are obtained or it is unclear how to construct a -confidence region for a general parameter . The following examples show that the approach via the principle of evidence can deal successfully with such problems.
3.1 Fieller’s Problem
This is a well-known problem, as discussed in Geary (1930), Fieller (1954), Hinkley (1969) and more recently in Pham-Gia et al. (2006) and Ghosh et al. (2006) where a wide range of applications are noted. Ghosh et al. (2006) is concerned with confidence intervals for ratios of regression coefficients in a normal linear model and it is shown that certain integrated likelihoods do not produce absurd intervals and this is now a consequence of the general Theorem 1. This problem is also discussed in Fraser et al. (2018) where it appears as problems A and B of a set of problems for inference proposed by D. R. Cox.
For this there are two samples . independent of where
is unknown. So it is supposed that the means are unknown but the variances are known and common. The discussion can be generalized to allow for unknown variances as well, with no changes to the basic results, but the essential problem arises in the simpler context. The problem then is to make inference about the ratio of meansand, in particular, construct a confidence interval for this quantity. It is assumed here that model checking has not led to any suspicions concerning the validity of the models. As such the data can be reduced to the minimal sufficient statistic where independent of
Confidence regions for can be obtained via a pivotal statistic given by but this can produce absurd regions. For example, if a -confidence interval is required for then, with denoting the
-th quantile of athe region equals whenever and Sometimes the region can be a so-called exclusive region of the form with While an interval might be preferred, there is nothing illogical about an exclusive region as can be seen by considering the -confidence interval for If this interval includes then necessarily the -confidence region for has the exclusive form The same reasoning applies in Fieller’s problem and one can always reparameterize by making inference instead about to obtain an interval. The problem of exclusive regions is a consequence of the parameterization but that is not the case with absurd regions as this represents a defect in the inference.
The relative belief approach requires the specification of a prior and for this conjugate priorsindependent of will be used. This requires an elicitation for the quantities which can proceed as follows. First specify such that the true value of with virtual certainty, say with prior probability Then put and solve for so the prior on is now determined. This step could also be applied to obtain the prior for but it is supposed instead that there is information about the true value of expressed as with virtual certainty for fixed constants . A value is then selected, which could be a hypothesized value for this quantity or just the central value, and then take Finally, requiring with virtual certainty determines via and this gives the prior for . This is just one method for eliciting the prior and an alternative could be more suitable in a given application. Once a prior has been determined and the data obtained, the prior is subjected to a check for prior-data conflict and it is assumed here that the prior has passed such a check.
Some numerical examples are carried along for illustration purposes.
Example 1. Simulation example (the data, model and prior).
Suppose so the true value is Data was generated leading to the mss For the prior elicitation suppose so and with then The value is chosen as the hypothesis will be subsequently assessed to see how the approach performs with a true hypothesis. Inverting the pivotal leads to the -confidence region for which just includes the true value.
Example 2. Cox’s examples (the data, model and prior).
For the Cox A problem, which produces the exclusive -confidence region via the pivotal. For Cox B the only change is that now and the -confidence region is and so is absurd. No priors were prescribed for either problem, so here we take fairly noninformative priors that avoid prior-data conflict. For problem A suppose independent of and for problem B suppose both priors are
then the exact prior density of is
Note that when then is a (rescaled) Cauchy density so in general this distribution has quite long tails. The same formula works for the posterior with substitutions as in (5) since
So the relative belief ratio is available in closed form.
For a general problem, a closed form is typically not available for the prior and posterior densities of marginal parameters of interest. In an application, however, there is a difference that represents the accuracy with which it is desired to know the true value. This quantity is a major input into sample size considerations. The approach then is to partition the effective prior range of as determined via a simulation from the prior of into subintervals of length with the midpoint of each interval taken as representative of the values in that subinterval. The prior and posterior contents of these subintervals are determined via a simulation and then density histograms are used to approximate and which in turn gives an approximation to that can be used to determine the inferences.
Example 1. Simulation example (the inferences).
The above approximation procedure was carried out, using the values recorded when was chosen for the accuracy. Figure 1 provides plots of and and Due to the long-tailed feature of the prior some extreme values of are obtained and this is reflected in the range over which these distribution have been plotted. Relatively smooth estimates are obtained based on Monte Carlo sample sizes of and these can be seen to closely approximate the true functions. One approach for coping with the long-tail is to calculate the ecdf of based on a large simulation sample and take so this ignores of the probability in the tails which is what was done here. Another possibility, which avoids the truncation, is to transform to where is a long-tailed cdf like a Cauchy (or even sub-Cauchy) and transform the initial partition to All inferences for can then be obtained from those for via the transformation due to the invariance of relative belief inferences under reparameterizations.
The relative belief estimate is given by with plausible region having posterior content and prior content So the plausible region contains the true value, and note that the estimate is reasonably accurate for a relatively small amount of data.
Example 2. Cox’s examples (the inferences).
For the Cox A problem, the plausible region is having posterior content and prior content So the inferences are not very precise. For the Cox B problem, the plausible region is having posterior content and prior content and the absurd interval is avoided. Both cases can be considered extreme as there is little data relative to the variance
Now consider the bias calculations. To compute the biases for hypothesis assessment it is necessary to compute
for various values of where is generated from the conditional prior predictive given and to compute the biases for estimation we need to be able to compute (6) for values of and then average. So it is necessary to: (i) generate from its conditional prior predictive and (ii) compute and compare it to 1.
For (i) the following sequential algorithm will work:
1. generate 2. generate 3. generate
Steps 2 and 3 are straightforward while step 1 requires the development of a suitable algorithm. The joint prior density of is proportional to
which implies that where and are as specified in (5). Therefore, is close to a normal density but for the factor Transforming we need to be able to generate from a density of the form where Using then and which implies
So with probability generate from and otherwise generate from The cdf of for equals
and to generate from via inversion generate and solve for by bisection. To start the bisection set and for some iteratively evaluate for until setting as this guarantees so bisection will work. The cdf of is for
and for this start bisection with and iteratively evaluate until setting so bisection will work. Finally, when is obtained put to get the appropriately generated value of An interesting consequence of this algorithm is that it must be true that for every and this implies the well-known Mills ratio inequality when and when
which gives useful bounds on tail probabilities for the normal distribution whenis large.
To determine (6) the value needs to be computed for each generated value of This can be carried out as previously using the discretized version but using the closed form version is much more efficient. It might seem more appropriate to use the exact form also for inferences but, because we wish to incorporate the meaningful difference into the inferences, the discretized version is much more efficient for those computations. Note too that a high degree of accuracy is not required for the bias computations.
Now consider the biases in the numerical problems being considered.
Example 1. Simulation example (the biases).
The hypothesis assessment problem is then, using the elicited values of leads to and Figure 2 is density histogram of a sample of from
To get the bias against use the sequential algorithm to generate , compute and compare to for a large number of repetitions recording the proportion of times In this problem the value was obtained based on a Monte Carlo sample of and so there is no real bias against Figure 3 is a plot of versus which is maximized at and takes the value there. This implies that the conditional prior probability the plausible plausible region contains the true value is at least for all and so can be considered as a -confidence interval for If instead we had then the maximum bias against is and the plausible region would then be -confidence interval for and of course larger sample sizes will just increase the confidence.
To get the bias in favor of use the sequential algorithm to generate , compute and compare to 1, for a large number of repetitions record the proportion of times and also do this for and the maximum of the two is an upper bound on the bias in favor. In this case the value is obtained which is very high indicating that there is substantial bias in favor of the hypothesis. In other words, there is a substantial prior probability that evidence in favor of the hypothesis will be obtained even when it is meaningfully false as determined by Of course, sample size is playing a role here as well as For the upper bound equals , for the upper bound equals , while for the upper bound equals So is not enough data to ensure that evidence in favor of will not be obtained when it is meaningfully false with and more data needs to be collected to avoid this. For the bias in favor for estimation a sample of values is generated and the bias in favor of at is determined and then averaged. Figure 4 is a plot of the bias in favor as a function of and the average value is which is an upper bound on the the prior probability that the plausible region contains a meaningfully false value. When the upper bound equals when the upper bound equals and when the upper bound equals The value of is determined by the application and taking it too small clearly results in the requirement of overly large sample sizes to get the bias in favor small. For example, with and then the bias in favor for estimation is while for it is and with these values are and , respectively.
Example 2. Cox’s examples (the biases).
For the first problem an upper bound on the bias against is given by so the coverage probability for the plausible region is at least For the second problem an upper bound on the bias against is given by so the coverage probability for the plausible region is at least These coverages are quite reasonable given the small sample sizes relative to
3.2 Mandelkern’s Examples
Mandelkern (2002) discusses several problems in physics where confidence intervals are required but for which no acceptable solution exists. These are problems where standard statistical models are used and, in the unconstrained case, well-known confidence intervals are available for but physical theory demands that the true value lie in for a proper subset If is a -confidence region for unconstrained then it is certainly the case that is a -confidence region under the constraint. While this has the correct coverage probability, however, in general can equal with positive probability and so this solution is absurd. As is now demonstrated the approach discussed here provides an effective solution to this problem.
Mandelkern’s examples are now described together with the solutions.
Example 3. Location-normal with constrained mean.
The model here is that a sample has been obtained from a distribution in where is known and is known to lie in the interval where with Mandelkern discusses inferences concerning the mass of a neutrino so in that case The measurements are taken to a certain accuracy and this is reflected in the specification of the quantity which is the accuracy to which it is desired to know which may indeed be larger than the accuracy of the measurements. This leads to a grid of possible values for say and such that So when and are both finite the possible values of are given by for and it is supposed that these values are such that is an integer. It is certainly possible for one or both of to be infinite but typically there are lower and upper bounds on what a measurement can equal. So in practice a finite number of such intervals with possibly two tail intervals, which contain very little prior probability, suffices. For example, consider measuring a length to the nearest centimeter so it would make sense to take cm and the are consecutive integer values in centimeters and all values in are considered effectively equivalent. For the neutrino problem there is undoubtedly a guaranteed upper bound on the mass. As discussed for Fieller’s problem, priors are chosen via elicitation and continuous priors are considered here with the previously described discretization applied for computations when necessary. Results for two priors are presented for comparison purposes.
The first prior is taken to be a beta distribution on the interval with the elicitation procedure as described in Evans, Guttman and Li (2017) although others are possible. For this where beta The values of are specified as follows. First it is required that to ensure unimodality and no singularities. Next a proper subinterval is specified such that with prior probability Typically will be a large probability (like or higher) reflecting the fact that is known to be true with virtual certainty. Then the mode is taken to be equal to a value in such as which implies where This leads to values for and as
that are fully specified once is chosen. The value of controls the dispersion of the beta and, with the cdf denoted beta we want betabeta, and this is easily solved for by an iterative procedure based on bisection. For example, with