On Resolving Problems with Conditionality and Its Implications for Characterizing Statistical Evidence

The conditionality principle C plays a key role in attempts to characterize the concept of statistical evidence. The standard version of C considers a model and a derived conditional model, formed by conditioning on an ancillary statistic for the model, together with the data, to be equivalent with respect to their statistical evidence content. This equivalence is considered to hold for any ancillary statistic for the model but creates two problems. First, there can be more than one maximal ancillary in a given context and this leads to C not being an equivalence relation and, as such, calls into question whether C is a proper characterization of statistical evidence. Second, a statistic A can change from ancillary to informative (in its marginal distribution) when another ancillary B changes, from having one known distribution P_B, to having another known distribution Q_B. This means that the stability of ancillarity differs across ancillary statistics and raises the issue of when a statistic can be said to be truly ancillary. It is therefore natural, and practically important, to limit conditioning to the set of ancillaries whose distribution is irrelevant to the ancillary status of any other ancillary statistic. This results in a family of ancillaries for which there is a unique maximal member. This also gives a new principle for inference, the stable conditionality principle, that satisfies the criteria required for any principle whose aim is to characterize statistical evidence.

• 12 publications
• 2 publications
03/05/2019

Measuring and Controlling Bias for Some Bayesian Inferences and the Relation to Frequentist Criteria

A common concern with Bayesian methodology in scientific contexts is tha...
12/03/2019

Evidence for goodness of fit in Karl Pearson chi-squared statistics

Chi-squared tests for lack of fit are traditionally employed to find evi...
10/04/2018

Lower and Upper Conditioning in Quantum Bayesian Theory

Updating a probability distribution in the light of new evidence is a ve...
06/26/2018

The conditionality principle in high-dimensional regression

Consider a high-dimensional linear regression problem, where the number ...
11/26/2013

On the Complexity and Approximation of Binary Evidence in Lifted Inference

Lifted inference algorithms exploit symmetries in probabilistic models t...
02/03/2022

Univalent foundations and the equivalence principle

In this paper, we explore the 'equivalence principle' (EP): roughly, sta...
10/14/2018

q-Stirling numbers arising from vincular patterns

The distribution of certain Mahonian statistic (called BAST) introduced ...

1 Introduction

The conditionality principle has played a puzzling role in attempts to develop a frequentist theory of statistical inference. On the one hand it seems intuitively obvious, and even a necessary component of such a theory. But it also produces a significant ambiguity due to nonequivalent applications for which there seems to be no easy solution in terms of determining which is correct or even if any are correct. Attempts to ignore this problem, typically by considering certain applications as equivalent, produces the somewhat strange phenomenon that , a frequentist principle, can lead to the likelihood principle which precludes any frequentist inferences, see Evans, Fraser and Monette (1986) and Evans (2013) for discussion of this.

The fact that is not an equivalence relation, which any valid characterization of statistical evidence must be, calls into question the justification for This can be considered as a logical inconsistency in the definition of . Moreover, as will be shown, the ancillary status of a statistic can change to being informative when the distribution of another ancillary statistic changes. This raises the issue of whether the distribution of such a statistic is truly irrelevant for inference, which can be considered as a statistical inconsistency in the definition of

The purpose of this paper is to propose a resolution to these problems. It is argued that a correct characterization of the ancillary concept requires the restriction of the set of possible ancillaries for use to a subset and this is based upon very natural statistical criteria. Once the restriction is made, there is a unique maximal member of this subset and this becomes the ancillary to use as it makes the maximal reduction in the set of possible data values to compare the observed data to in the conditional model. We show that natural statistical criteria lead to the set being the minimal ancillaries, whose maximum is the laminal ancillary as labelled by the taxonomy of Basu (1959).

One could argue that this isn’t much of an advance, particularly because the laminal ancillary is often trivial, but we would counterargue that it is significant because it shows that the other ancillaries, besides the laminal, are ineligible to be used in the conditioning step. This establishes the validity of some form of for inference and this has broad implications. In particular, the idea that together with the sufficiency principle can lead to , as discussed, for example, in Birnbaum (1962), Evans, Fraser and Monette (1986), Evans (2013) and many others, is completely avoided and this applies similarly to the argument that alone can produce Additionally, it leads to a new and uncontroversial principle that combines and a modified that still permits frequentist considerations for inferences.

In Section 2 the conditionality principle is discussed. In Section 3 we introduce the statistical criterion that assesses whether an ancillary statistic is unstable (can become informative) if one merely changes the distribution of another ancillary statistic. We show how this connects to the minimal and laminal ancillaries. In Section 4 a principle is introduced which satisfies both and the new conditionality principle. This principle forms an equivalence relation in the class of all inference bases and so is indeed a valid partial characterization of statistical evidence, which was Birnbaum’s intention. The proofs of all propositions are placed in the Appendix.

The conditionality principle has attracted many authors some of whom have attempted resolutions. The papers Basu (1959), Basu (1962), Cox (1958), Cox (1971), Kalbfleisch (1975), Buehler (1982), Stigler (2001) and Ghosh, Reid and Fraser (2010) all represent interesting contributions and there are many more which can be found in the references of these papers. To the best of our knowledge nobody has presented a forceful argument for the laminal ancillary as being the natural resolution and that is the outcome of the discussion in Section 3.

2 Principles and Ancillaries

All of the principles and are applied to inference bases. An inference base is comprised of a statistical model

 M=(X,B,{Pθ,X:θ∈Θ}),

where is a sample space containing all possible values for the observed data of random object is a -algebra on and

is a collection of probability measures defined on

indexed by model parameter For inference, the assumption is made that there is a true value of say such that, before it is observed, The goal, once is observed, is to make inference about which of the possible values of corresponds to and these inferences are based somehow on the ingredients More generally, our interest is in some marginal parameter that has a real-world interpretation and it is desired to know the value and this requires dealing with so-called nuisance parameters. This more general problem is ignored here except to say that the concept of conditioning on an ancillary for the model is still relevant for that context. Birnbaum (1962) considered the set of all inference bases and for inference bases and with essentially the same model parameter (or bijective relabellings thereof), indicated that these inference bases contain the same statistical evidence about the true value of the model parameter by writing

An ancillary statistic for the model is a map such that the marginal probability measure induced by satisfies for every In other words is ancillary when its marginal distribution is independent of the model parameter and it is then claimed that the observed value of contains no information about More than this, simple examples, like the two measuring instruments example in Cox (1958), suggest that for frequentist inferences the initial model in be replaced by where is the conditional probability measure for given the value The principle then states An ancillary is a maximal ancillary if, whenever is another ancillary and there exists a function such that then is effectively a 1-1 function. So, the set of possible data values that is conditioned on via when the value of a maximal ancillary is observed, cannot be made smaller without losing ancillarity.

It is natural to make the greatest possible reduction in the set of possible sample values we use for inference and so a possible full statement of would be to condition on a maximal ancillary. When there is a unique maximal ancillary this is uncontroversial. As Example 1 shows, however, there can be several maximal ancillaries. In such a case there is an ambiguity concerning which maximal ancillary to use when applying as, for two maximal ancillaries and inference bases and can lead to quite different inferences, see Example 2. It is shown in Evans (2013) that the lack of a unique maximal ancillary implies that is not an equivalence relation on the set of all inference bases and therefore, as currently stated, it is not a correct characterization of statistical evidence. Also, it is shown there that, if is the smallest equivalence relation containing then Similarly the smallest equivalence relation containing , which is also not an equivalence relation, satisfies and this is what the proof of Birnbaum’s theorem proves. So the lack of a unique maximal ancillary leaves open the question of whether or not or some modification, is indeed a valid statistical principle that should be employed in statistical work.

Basu (1959) defined a minimal ancillary as any ancillary which is a function of every maximal ancillary and showed that there is a unique ancillary in the class of minimal ancillaries, called the laminal ancillary, which is maximal in this class. The following example illustrates these concepts.

Example 1.

Suppose consists of two distributions as provided in the Table 1 together with the likelihood ratio (LR). Actually it is a range of examples as is any value satisfying For each such case the minimal sufficient statistic (mss) is the identity which is not the case if . This implies that all the ancillaries are functions of the mss and this will prove important for our later discussion.

Since any 1-1 function of an ancillary is ancillary, it is equivalent to present all the preimage partitions induced by such statistics when considering the ancillary structure of this model and some of these are provided in the following table. It is clear from this table that the maximal ancillaries are given by and as these give the finest ancillary partitions, and so the laminal ancillary must be as it is the finest partition containing both maximal ancillaries. The minimal ancillaries are given by where is the trivial ancillary, as these are all coarsenings of both and and are presented in Table 2.

There are ancillaries that are coarsenings of single maximal ancillaries such as

 C1 :{1,3},{2,4},{5,6,7} C2 :{1,3,5,6},{2,4},{7}

which are coarsenings of but not of and there are many others.

If the sample space were shrunk to with the probability for redistributed equally among the 4 sample points, then the laminal ancillary becomes the trivial ancillary and this is not uncommon, as noted in Basu (1959) where conditions for this to occur are discussed.

The following example demonstrates the ambiguity that a nonunique maximal ancillary can produce and is adapted from Evans (2015).

Example 2.

Consider the model given by Table 3 and suppose is observed. The MLE of is

There are two maximal ancillaries as given by their partitions, namely and The sampling distributions of the MLE obtained by conditioning on the maximal ancillaries are as displayed in Table 4.

As can be seen, these sampling distributions are quite different and it is not clear which to use as part of quantifying the uncertainty in the estimate.

3 Stable and Strong Ancillaries

Despite the rich structure of the ancillary statistics, standard evidence theory assumes (through the standard conditionality principle ) that conditioning on different ancillary statistics is equally valid. We challenge this assumption through two main perspectives, which give rise to a resolution.

Reproducing the structure with a single maximal ancillary As noted in Evans (2013), the fact that more than one maximal ancillary can exist results in not forming an equivalence relation on the set of all inference bases. If we want to claim that a given principle does properly characterize when two inference bases contain the same amount of statistical evidence concerning an unknown then it seems clear that the principle must induce an equivalence relation. Therefore, needs to be modified if it is desirable for conditioning on ancillaries to play a role in inference.

Basu (1959) introduced the concept that two ancillary subsets for model conform when is also ancillary. The set of all ancillary subsets that conform to every other ancillary subset is denoted by and it is proved that is a -algebra and moreover this is the laminal ancillary -algebra in the sense that it is the largest -algebra contained in all the -algebras induced by the individual maximal ancillaries. This is effectively saying that (allowing for 1-1 equivalences) the laminal ancillary statistic is a function of every maximal ancillary. A further implication of this is that the laminal ancillary -algebra is the largest minimal ancillary -algebra and so the laminal ancillary statistic is the maximal minimal ancillary statistic. Also, if there is a unique maximal ancillary then this is also the laminal ancillary. This points to a special role for the laminal ancillary especially since the laminal ancillary always exists and a conditionality principle that prescribed conditioning on the laminal forms an equivalence relation on the set of inference bases, see Section 4.

Although logical, this role has not been explored. Perhaps this is because the laminal doesn’t often produce a meaningful reduction. But also Basu’s development, while logical, doesn’t provide a good statistical reason to adopt the laminal as the logical ancillary to condition on. It is argued here, however, that there is a key element that can be added to the story and with this addition the laminal is not only a logical resolution, but is a statistical necessity.

Addressing the transition of ancillaries to informative statistics The key idea in this development is the supposed irrelevance of the distribution of an ancillary that is to be conditioned on. For after all, as far as inference goes, this distribution plays absolutely no role whatsoever. The statistical intuition behind this is that the distribution of the ancillary is free of the parameter and so an observation from it contains no information about As such, it must be the case that, no matter what distribution is assumed for an ancillary this cannot change the basic information structure of the problem. Note that this is a more severe requirement for what it means for a statistic to be ancillary. Two definitions that capture this idea are now provided and their equivalence proved. It is then proved that the set of ancillaries which satisfy this criterion has a maximal member and it is the laminal ancillary. To avoid a measure-theoretic presentation via -algebras, as in Basu (1959), it will be assumed here that all ancillaries are discretely distributed on and that there are at most a countable number of ancillaries, as this is sufficient for conveying the key ideas.

For ancillary for model the following notation is adopted

 M=∑iPU({i})M|U=i.

This expresses the idea that the model is a mixture of the component models obtained by conditioning on where the mixture probabilities are given by the marginal distribution of The following definitions capture the idea that the distribution of should be irrelevant for the inference problem.

Definition An ancillary for model is called a stable ancillary for if, whenever is ancillary for then is ancillary for the mixture

for every probability distribution

on the set of possible values for An ancillary for model is called a strong ancillary for if any ancillary for is also ancillary for the mixture for every probability distribution on the set of possible values for

So is a stable ancillary when changing the distribution of any other ancillary has no effect on the ancillarity of and is a strong ancillary if changing the distribution of has no effect on the ancillarity of any other ancillary. For any ancillary that is not stable, then conditioning on the value of some other ancillary renders the value informative which contradicts the underlying motivation that the value of an ancillary statistic contains no evidence concerning . Similarly, if is not strong, then conditioning on the value renders the value of some other ancillary informative. Accordingly, it is difficult to accept the claim that the value of an ancillary that is not stable/strong is noninformative with respect to

In actuality, a stable ancillary is strong and a strong ancillary is stable as the following result shows.

Proposition 1. is a strong ancillary for iff it is a stable ancillary for

Given that stable and strong ancillaries are just different expressions of the same concept, these will be referred to hereafter as stable ancillaries.

In part (i) of the following result it is now shown that a stable ancillary is a minimal ancillary and a minimal ancillary is a stable ancillary. Since Basu (1959) proved that the laminal ancillary is the maximal minimal ancillary this establishes that the laminal ancillary is the maximal stable ancillary and, for the sake of completeness, this is proved in part (ii).

Proposition 2. (i) A stable/strong ancillary is a minimal ancillary and conversely. (ii) There exists a maximal minimal ancillary (the laminal ancillary).

Since the word minimal doesn’t really convey the positive aspects of such ancillaries these will be referenced as stable ancillaries hereafter.

It is worth noting that the structure given by the minimal and laminal ancillaries is really the largest ancillary structure within the model that replicates the situation where there is a single maximal ancillary and, as such, there is no ambiguity about which ancillary to condition on. This coherence points to the laminal ancillary as playing a special role and this is reinforced by the notion of stability of an ancillary.

The following example demonstrate numerically the extent to which, having an incorrect distribution of an unstable ancillary (i) can transform another unstable ancillary to informative; yet (ii) preserves the ancillary state of a stable ancillary.

Example 3.

Consider again Example 1 with , but now consider what happens to the ancillary state of the unstable ancillary and the stable ancillary , when the distribution of the unstable ancillary is changed from as given by to a true distribution that is unknown to the researcher, as given by , see Figure 1. It is then observed that stays ancillary, as theory assures, namely, for both and the distribution of is under the first scenario and under the second. However, the likelihood ratios of a given value a given value are largely away from has lost its ancillary state and is now informative.

One may consider reasonable that such sensitivity of the ancillary state for a statistic suggests that its ancillarity is not a structural feature of the design, but is rather an erroneous coincidence. This possibility, while not testable within the model, suggests that one should focus any conditioning only on stable ancillaries.

To see additionally why needs to be modified we examine the motivation for conditioning as part of the inference process. This arises from considering mixture experiments. Suppose there are a set of models say with where the data will arise from one of these models. The model that produces the data is obtained via a randomization procedure where a value is produced with probabilities given by on This mixing produces the overall model and is ancillary for If the value of is observed, then says that the inference base is the one that is relevant for inference about This seems uncontroversial and therein lies the appeal of

The controversy surrounding arises when, rather than being presented with a physical randomization device as part of a two-stage procedure, as just described, we are presented with the inference base with being ancillary for Since can be at least be formally considered as a mixture model via it then seems reasonable to replace by where for inference about

But now consider two studies conducted by statisticians 1 and 2 concerning the true value of the quantity but suppose different randomization schemes are used in each. So, in the -th study the collection of models is given by and the relevant ancillary is Suppose that the results of the mixing produces the same overall model and furthermore the same data is obtained. This may seem unrealistic, but recall that in the end this is the situation that confronts us when considering a model with multiple ancillaries and we wish to justify conditioning on one of them.

It would seem then that both studies would conclude that the evidence about the true value of in the inference base is the same but the expression of this will be different, and result in different conditional inference bases, unless effectively the same maximal ancillary is being used for the mixing. In Example 1, suppose the two randomization schemes are specified by the maximal ancillaries and as this will be a case where the conditional inference bases will be different. Recall, however, that the specific distributions for the are supposedly irrelevant for inference about and indeed these play no role in the actual inferences. But now suppose, for whatever reason, statistician 1 decides to modify their randomization scheme by changing the distribution of say from to This does not change the submodels and so this change in the ancillary distribution seems innocuous to statistician 1 as their inferences will not change due to the irrelevance of the distribution of the ancillary. The overall model however, has changed to and this may produce a conflict with statistician 2 because it may be that is no longer ancillary in and is now informative. Statistician 2 can now rightly claim that the distribution of is definitely relevant to the inference process and so there is a contradiction between the two statisticians.

This demonstrates that there is a clear contradiction that resides within the reasoning that justifies at least as long as it is silent about which ancillaries are appropriate for the conditioning step The content of this paper has demonstrated how to resolve this contradiction by making sure that any ancillaries that are used do not produce the phenomenon just described. The relevant ancillaries to use are the stable ancillaries and indeed their marginal distributions are irrelevant for inference. The irrelevance of the marginal distribution of a stable ancillary is similar to the irrelevance of the conditional distribution of the data given a mss and both can be discarded for inference. This recovers conditioning on an ancillary as a valid part of the inference process. Of course, we want to make the maximal reduction via conditioning, to eliminate as much of the variation as possible that has nothing to do with and this leads to conditioning on the laminal.

4 Stable Conditionality and Evidence

In discussing statistical evidence Birnbaum (1962) introduced the function defined on the set of all inference bases. When two inference bases were considered to be equivalent with respect to their content of statistical evidence, this was denoted by Birnbaum did not, however, specify the value of While this is understandable, this approach is modified here as evidence functions are fully defined (up to 1-1 equivalence due to relabellings) for the principles discussed. The basic reason for this is that a principle of inference should not only state an equivalence, but also prevent the usage of aspects of an inference base that are identified as irrelevant for the inference process. As pointed out in Durbin (1976), ensuring that this didn’t happen was one way of preventing Birnbaum’s proof of his well-known theorem. We still do not give a full definition of but it is argued that this takes us some steps closer and that such restrictions are a necessity.

In what follows, we examine the consequences that arise for statistical evidence as described in Birnbaum, if one focuses on the set of stable ancillaries that are functions of a mss for a model , namely,

 AM={A:A is a stable ancillary and a function of a mss for model M}. (1)

It was pointed out in Durbin (1970) that restricting to ancillaries that are functions of a mss voided the proof of Birnbaum’s theorem. Evans et al. (1986) argued that this was a natural restriction because otherwise the information being conditioned on via the ancillary was precisely the information being discarded as irrelevant via sufficiency in Birnbaum’s proof. As such, there existed a contradiction between the principles and in that context The restriction to ancillaries that are functions of a mss also seems implicit in Fisher’s development of the ancillarity concept, as documented in Stigler (2001).

Based on the developments in Section 3, the restriction is made to those ancillaries that are stable because these are in a sense the ancillaries that truly introduce no information into the analysis concerning the true distribution. It is to be noted that there still is a place in a statistical analysis for ancillaries that are not functions of a mss as, for example, in regression analysis with normal error where the standardized residuals are ancillaries that are not functions of the mss but play a key role in model checking. Our concern here, however, is with the inference step and the restriction to (

1) seems essential in that context.

For simplicity, we suppose that the parameter space and the sample space are both finite as this doesn’t change the essential meaning of the principles. Also we take the power set of and suppress this in the notation hereafter. It is assumed that is the same in any two inference bases that we consider related via although it is possible to allow one parameter space to be a 1-1 relabelling of the other but this is ignored here. Also, it will always be assumed that, for each then there is at least one such that so the sample space cannot be made smaller.

A sufficient statistic is any function defined on such that, if then and are in the same equivalence class associated with the sufficiency equivalence relation on given by  whenever there is a constant such that for every . A mss is a sufficient statistic such that when then and so it is any function on that indexes the equivalence classes. The value of the mss represents the maximal reduction in the observed data that results in no information loss concerning A canonical representative of the mss is, as discussed in Evans (2015), Lemma 3.3.2, given by where is the equivalence class induced by on Any function on that is constant on each set and different on and when can also serve as a mss. For example, when there is such that for all then the mss can be taken to be

 T(x)=(Pθ1,X({x})/Pθi,X({x}),…,Pθn,X({x})/Pθi,X({x})).

Let denote the mss, however it is chosen, with model

The following statement of the sufficiency principle is equivalent to the statement in Birnbaum (1962) but it is easier to use this version to prove that is indeed an equivalence relation on the set of all inference bases, see Evans (2015), Lemma 3.3.3. Here we allow for any version of the mss as where is a 1-1 function (a relabelling) defined on This allows for relating two inference bases and that may have very different models but their minimal sufficient statistics are essentially equivalent under such a relabelling and so the principle is defined as a relation on the set of all inference bases.

Sufficiency Principle The inference bases and with minimal sufficient statistics and respectively, are equivalent under whenever there is a a 1-1 onto, function such that and

 (M1,T1,T1(x1))=(M2,h(T2),h(T2(x2))).

So when and are related via the sampling distributions of and are essentially the same as are the observed values of these statistics. For example, as a particular application, if model has mss then observations satisfying together with the model, contain the same evidence about i.e.,

 Ev(X,{Pθ,X:θ∈Θ},x)=Ev(X,{Pθ,X:θ∈Θ},y)

where the function is just the identity in this case.

While no image space is defined for it is necessary to do this for a specific principle so that it is clear that the goal of the principle is to also exclude ingredients that are really extraneous to the intent of the principle. It is immediate from that

 Ev(X,{Pθ,X:θ∈Θ},x)=Ev(T,{Pθ,T:θ∈Θ},T(x))

and this is undoubtedly the most important application of the principle, namely, all inferences about the true value of are based on the model for a mss and its observed value. This leads to the definition of the minimal sufficiency evidence function given by

 EvMS(X,{Pθ,X:θ∈Θ},x)=(T,{Pθ,T:θ∈Θ},T(x))=(MT,T(x)),

for say the canonical mss although any other equivalent version of the mss could be used In other words, we are restricting what we consider an appropriate presentation of the evidence based on The ultimate evidence function, whatever it may be, will be composed with

For ancillary statistic for model we write for the family of derived conditional distributions on obtained by conditioning on the event specified by The discussion in Section 3 about ancillarity then leads to the following modified conditionality principle where again we state a general version of the principle that can be applied to relate (or not) any inference bases.

Stable Conditionality Principle The inference bases and with minimal sufficient statistics and laminal ancillaries respectively, are equivalent under whenever there is a a 1-1 onto, function such that and

 (M1,T1∣L1(x1),T1(x1))=(M2,h(T2)∣L2(x2),h(T2(x2))). (2)

For example, if model has mss and laminal ancillary then observations satisfying together with the conditional model, contain the same evidence about i.e.,

 Ev(X,{Pθ,X|L(x):θ∈Θ},x)=Ev(X,{Pθ,X|L(y):θ∈Θ},y)

where the function is just the identity in this case.

It follows from that

 Ev(X,{Pθ,X:θ∈Θ},x)=Ev(T,{Pθ,T|L(x):θ∈Θ},T(x))

and this is undoubtedly the most important application of the principle. This leads to the definition of the stable conditionality evidence function given by

 EvSC(X,{Pθ,X:θ∈Θ},x)=(T,{Pθ,T|L(T(x)):θ∈Θ},T(x)) (3)

for say the canonical mss although any other equivalent version of the mss could be used.

It is necessary to prove that is an equivalence relation on the set of all inference bases as part of establishing that is a valid characterization of statistical evidence.

Proposition 3. is an equivalence relation on the set of inference bases.

It is obvious that, as relations on the set of all inference bases, The fact that is an equivalence relation establishes that this containment is proper because it has been established that is not an equivalence relation, see Evans (2013) or Evans (2015), Lemma 3.3.4. It has also been shown in these references that the smallest equivalence relation containing is So an interesting consequence of Proposition 3 is that cannot be obtained from in this way.

Similarly, the same references establish that the relation given by is not an equivalence relation and the proof of Birnbaum’s Theorem establishes that the smallest equivalence relation containing is In this case the following establishes that so Birnabum’s Theorem does not follow from and

Proposition 4. As relations on the set of all inference bases

Note that only requires that the conditional models be effectively the same for given and this does not imply that the unconditional models are effectively the same so we cannot conclude that We do have, however, that the conditional inference bases are equivalent under

Proposition 5. If and are equivalent under then the conditional inference bases and are equivalent under

The following result demonstrates that the evidence function is the ultimate presentation of the evidence based upon and

Proposition 6. For data and model the evidence function defined by (3) satisfies

So, the evidence function that results from the two principles, can be unambiguously defined as the inference base containing both the observed value of the mss and the collection of conditional distributions given the laminal ancillary function of the mss as indexed by the model parameter.

The consequence of this development is that the application of the two principles can be thought of unambiguously as a function on the set of all inference bases. It is not clear that there shouldn’t be further reductions in to remove ingredients that are still extraneous to the expression of the evidence concerning but at this point it is not obvious what form those would take.

Also, statistical evidence is ultimately expressed as part of answering statistical questions. For example, what is the appropriate estimate of and how accurate is it or is there evidence for or against a hypothesis and how strong is this evidence? Simply stating an inference base does not answer such questions but at least it does tell us what to focus on when devising the answer.

5 Conclusions

Various ambiguities have raised doubts about the possibility of a successful theory for frequentist inference. For example, Birnbaum’s theorem concerning and seemingly implying or for that matter alone implying are but two examples. While the validity of these conclusions has been challenged, consideration of these results still raises concerns as to what the correct applications of the principles are. For this is undoubtedly discarding all aspects of the inference base that are extraneous to expressing the evidence about and this leads to the principle as expressed by Durbin (1970) together with the evidence function which we add to the development. For our thesis is that the fundamental idea underlying the principle is better expressed by and the evidence function as this removes the ambiguity about which ancillary to condition on and avoids any contradictions in the justification for the irrelevance of the distribution of the ancillary. While the laminal ancillary may often be trivial, namely, a function constant on the sample space, it seems clear that we have to accept the verdict that conditioning on any ancillary other than the laminal is not appropriate. The results developed here have shown that the principles and are mutually compatible and satisfy the basic requirement of any statistical principle by inducing equivalence relations on the set of all inference bases. As such both the logical and statistical inconsistences in the definition of have been avoided.

It is true that the stable conditionality principle proposed here, is - in part - mathematically supported by the taxonomy results in Basu (1959). The present paper shows, however, that conditioning on stable ancillaries removes the logical inconsistencies of the standard conditionality principle and provides a coherent framework for the assessment of statistical evidence.

Certainly this is not the end of the story concerning the concept of statistical evidence and how it should be measured and expressed, but our hope is that clarifying the roles of two key principles contributes to a more solid foundation for statistics.

6 Appendix

Proof of Proposition 1

Suppose is a strong ancillary for and let be an alternative probability distribution on for the marginal distribution of Then, summing over those for which (otherwise is not defined),

 ∑i:PV({i})>0piPθ,U(B|V=i)=∑i:PV({i})>0piPV({i})Pθ,X(U−1B∩V−1{i}) =∑i:PV({i})>0piPV({i})∑j∈BPθ,X(V−1{i}|U=j)PU({j}) =∑i:PV({i})>0piPV({i})∑j∈BPV({i}|U=j)PU({j})

where the last equality follows because is strong which implies that is ancillary when the mixture distribution for puts all its mass at so is independent of as is the sum. Therefore, is stable.

Now suppose is a stable ancillary and is ancillary and let be an alternative probability distribution on for the marginal distribution of Then, summing over those for which

 ∑i:PU({i})>0piPθ,V(B|U=i)=∑i:PU({i})>0piPU({i})Pθ,X(V−1B∩U−1{i}) =∑i:PU({i})>0piPU({i})∑j∈BPθ(U−1{i}|V=j)PV({j}) =∑ipiPV({i})∑j∈BPU({i}|V=j)PV({j})

where the last equality follows because is stable. Therefore, is strong.

Proof of Proposition 2

(i) Suppose is an ancillary and it is not a function of a maximal ancillary Then it cannot be that is ancillary because, if it were, then is a function of and thus is not maximal. Since it cannot be the case that is independent of for every and and so is not strong. Therefore, any strong ancillary is a function of every maximal ancillary.

Conversely, suppose is a minimal ancillary and is another ancillary. Then can be expressed as function of some maximal ancillary say and since is minimal, it can also be expressed as for some function Then which is independent of because is ancillary. Therefore,

 ∑i:PV({i})>0piPθ,U(A|V=i)=∑i:PV({i})>0piPV({i})Pθ,X(k−1A∩h−1{i})

which is independent of for every probability distribution Therefore, is a stable ancillary.

(ii) Let be a list of the minimal ancillaries for model and put Now let and then so we can write and is a valid statistic. Further, for a maximal ancillary there exist functions such that and this implies that is ancillary. For any other ancillary there exist a maximal ancillary and function such that and also there are functions such that and this implies that is ancillary. As such this proves that is a minimal ancillary and moreover it is maximal in this class because every other minimal ancillary is a function of

Proof of Proposition 3

We need to show that the relation given by is (i) reflexive, (ii) symmetric and (iii) transitive.

(i) Suppose model has mss and laminal Then taking for and equal to the identity in (2) establishes reflexivity.

(ii) Symmetry also follows because (2) implies

 (M2,T2∣L2(x2),T2(x2))=(M1,h−1(T1)∣L1(x1),h−1(T1(x1))).

(iii) Finally suppose that and are related under as well as and Let denote the mss and laminal ancillaries for and be the 1-1, onto mappings that are used in (2) to establish these relations. Then

 (M1,T1∣L1(x1),T1(x1)) =(M2,h12(T2)∣L2(x2),h12(T2(x2))), (M1,T2∣L2(x2),T2(x2)) =(M3,h23(T3)∣L3(x3),h23(T3(x3)))

both hold. Now define Then if follows that

 (M3,h12∘h23(T3)∣L3(x3),h12∘h23(T3(x3))) =(M2,h12(T2)∣L2(x2),h12(T2(x2))) =(M1,T1∣L1(x1),T1(x1))

and this establishes that and are related under so the relation is transitive.

Proof of Proposition 4

Suppose that and are equivalent under so

 (M1,T1,T1(x1))=(M2,h(T2),h(T2(x2))).

Since the models and are relabellings of each other via this implies that the ancillarity structure of the two models is effectively (via the relabelling) the same and, in particular, the laminals and are related via This implies

 (M1,T1∣L1(x1),T1(x1))=(M2,h(T2)∣L2(x2),h(T2(x2)))

and so and are equivalent under

Proof of Proposition 5

Since the two conditional models are simply relabellings it must be the case that they have effectively the same minimal sufficient statistics and this implies the result.

Proof of Proposition 6

Since only depends on the model and data through the model for a mss and the observed value of and we have restricted to ancillaries that are functions of the mss, it is clear that

Now consider the reverse order where outputs based on laminal ancillary . We can write as for some function The sample space for in this conditional model is and which is a union of preimage contours of For a satisfying then Therefore, if are distinct elements of then we cannot have

 Pθ,T|L(x)({t1})=cPθ,T|L(x)({t2})

for every for some constant otherwise we would have for every and then would not be a mss for the original model. This also implies that the identity function is a mss for the conditional model which implies

 EvMS(T,{Pθ,T|L(x):θ∈Θ},T(x))=(T,{Pθ,T|L(T(x)):θ∈Θ},T(x)),

namely, there is no reduction. This proves the result.

7 References

Basu, D. (1959) The family of ancillary statistics. Sankhyā 21, 247-256.

Basu, D. (1964). Recovery of ancillary information. Sankhyā 26, 3-16.

Birnbaum, A. (1962) On the foundations of statistical inference (with discussion). J. Amer. Stat. Assoc., 57, 269-332.

Buehler, R. J. (1982). Some ancillary statistics and their properties. J. Amer. Statist. Assoc. 77, 581-594.

Cox, D. R. (1958) Some problems connected with statistical inference. The Annals of Mathematical Statistics, 29 (2), 357-372.

Cox, D. R. (1971). The choice between alternative ancillary statistics. J. Roy. Statist. Soc., B 33, 251-252.

Durbin, J. (1970) On Birnbaum’s theorem on the relation between sufficiency, conditionality and likelihood. J. Amer. Stat. Assoc., 654, 395-398.

Evans, M., Fraser, D.A.S. and Monette, G. (1986) On principles and arguments to likelihood (with discussion). Canad. J. of Statistics, 14, 3, 181-199.

Evans, M. (2013) What does the proof of Birnbaum’s theorem prove? Electronic Journal of Statistics, Volume 7, 2645-2655.

Evans, M. (2015) Measuring Statistical Evidence Using Relative Belief. Monographs on Statistics and Applied Probability 144, CRC Press.

Ghosh M., Reid, N. and Fraser, D. A. S. (2010) Ancillary statistics: a review. Statistica Sinica 20, 1309-1332.

Kalbfleisch, J.D. (1975) Sufficiency and conditionality. Biometrika, 62, 251-259.

Stigler, S. (2001) Ancillary History. IMS Lecture Notes-Monograph Series, 36, State of the Art in Probability and Statistics, 555-567.