The purpose of this paper is to present a modification to the theory of organic fiducial inference as this theory was outlined in Bowater (2019). The main change that will be put forward is one that allows Bayesian inference to sometimes have a minor role within this theory. To avoid too much repetition of the content of Bowater (2019), only the parts of the theory in question that have been altered will be discussed in detail. However, consideration will be given to the reader who does not want to spend too much time referring back to this earlier paper.
In this regard, some of the concepts and principles that underlie the theory of organic fiducial inference being referred to will be developed in the sections that immediately follow (i.e. Sections 2.1 to 2.4) in a revised way to how they were developed in Bowater (2019). The version of organic fiducial inference put forward in these sections will then, in Section 3, be applied to examples that were originally analysed in Bowater (2019) using the version of this theory of inference that was outlined in this earlier paper.
2 Theory of organic fiducial inference
2.1 Sampling model and data generation
It will be assumed, in general, that the data set to be analysed was generated by a sampling model that depends on a set of unknown parameters , where each is a one-dimensional variable. Let the joint density or mass function of the data given the true values of the parameters be denoted as
. For the moment, though, we will assume that the only unknown parameter in the model is, either because there are no other parameters in the model, or because the true values of the parameters in the set are known.
To begin with, let us define more precisely the concept of a fiducial statistic in comparison to how this type of statistic was defined in Bowater (2019).
Definition 1: A fiducial statistic
Let be a set of univariate statistics that together form a sufficient or approximately sufficient set of statistics for the parameter . To clarify, a set of statistics will be defined as being an approximately sufficient set of statistics for if conditioning the distribution of the data set of interest on this set of statistics leads to a distribution of this data set that does not depend heavily on the value of the parameter .
Now, if a given set , as just defined, only contains one statistic that is not an ancillary statistic, then that statistic will be called the fiducial statistic of this set . Assuming that the set is of this nature, we will denote all the statistics in except for the statistic as the set of statistics , and we will refer to the statistics in this set as being the ancillary complements of .
Note that often it may be possible to find a set that consists of just a single sufficient statistic for , and of course in these cases, the fiducial statistic will be this sufficient statistic and the set will be empty. On other occasions, it may be possible to find a single statistic that could sensibly be regarded as being an approximate sufficient statistic for , and if this is the case, we may decide to treat this statistic as being the fiducial statistic for the example in question. In this latter type of scenario, if may be appropriate to choose the set to be empty or non-empty. An example of a statistic that may often be regarded as being an approximately sufficient statistic for
is any one-to-one function of a unique maximum likelihood estimator of. The use of a maximum likelihood estimator as a fiducial statistic was illustrated in Section 5.7 of Bowater (2018) and Section 3.5 of Bowater (2020).
We will now make an assumption about how the data were generated. Essentially the same assumption was also made in Bowater (2019). Here, we simply present this assumption in a slightly different way.
Assumption 1: Data generating algorithm
Independent of the way in which the data set was actually generated, it will be assumed that this data set was generated by the following algorithm:
1) Simulate the values of the ancillary complements , if any exist, of a given fiducial statistic .
2) Generate a value for a continuous scalar random variable , which has a density function that does not depend on the parameter .
3) Determine a value for the fiducial statistic by setting equal to and equal to in the following expression for the statistic
, which should effectively define a probability density or mass function for this statistic :
a) the function has as arguments only the variable and constants such as the parameter and the statistics .
b) the density or mass function of defined by this equation is equal to what it would have been if had been determined on the basis of the data set conditional on the variables , if any exist, being equal to the values .
4) Generate the data set from the sampling density or mass function conditioned on the statistic being equal to its already generated value and the variables , if any exist, being equal to the values .
In the context of the above algorithm, the variable will be referred to as the primary random variable (primary r.v.), which is the way that this term was used in Bowater (2018, 2019, 2020). To clarify, if it is possible, which it is in many cases, to rewrite this algorithm so that, after the data set is generated from the sampling density or mass function by using some black-box procedure, the value of the variable is generated by setting it equal to a deterministic function of the data and the parameter , then would not be the primary r.v. in the context of this alternative algorithm.
2.2 Types of fiducial argument and expression of pre-data knowledge
Let us once again present the definition of the strong fiducial argument as it was given in Bowater (2019).
Definition 2(a): Strong or standard fiducial argument
This is the argument that the density function of the primary r.v. after the data have been observed, i.e. the post-data density function of , should be equal to the pre-data density function of , i.e. the density function as defined in step 2 of the algorithm in Assumption 1.
The definitions of the moderate and weak fiducial arguments will not be presented here. These two arguments will be assumed to have the same definitions as given in Bowater (2019).
To assist the reader though, let us present, once more, the definition of the global pre-data (GPD) function as it was given in this earlier paper.
Definition 3: Global pre-data (GPD) function
The global pre-data (GPD) function may be any given non-negative and upper bounded function of the parameter . It is a function that only needs to be specified up to a proportionality constant, in the sense that, if it is multiplied by a positive constant, then the value of the constant is redundant.
Even though in the modified version of the theory of organic fiducial inference being currently outlined, the basic definition of the local pre-data (LPD) function is the same as the definition of this function given in Bowater (2019), its role in this theory of inference will now be quite distinct from the role it was given in this earlier paper. It is appropriate therefore to give, once again, the basic definition of a LPD function along with some general comments about this type of function.
Definition 4: Local pre-data (LPD) function
The local pre-data (LPD) function may be any given non-negative function of the parameter that is locally integrable over the space of this parameter. Similar to a GPD function, it only needs to be specified up to a proportionality constant.
The role of the LPD function is to facilitate the completion of the definition of the joint post-data density function of the primary r.v. and the parameter in cases where using either the strong or moderate fiducial argument alone is not sufficient to achieve this. For this reason, the LPD function is in fact redundant in many situations.
We describe such a function as being ‘local’ because it is only used in the inferential process under the condition that equals a specific value, and with this condition in place, the act of observing the data will usually imply that the parameter must lie in a compact set that is contained in quite a small region of the real line. It will be seen that because of this, even if the LPD function is not redundant, its influence on the inferential process will be, in the main, relatively minor.
2.3 Principles for defining univariate fiducial density functions
As was the case in Bowater (2019), the fiducial density function of the parameter given the data and conditional on all other parameters being known, i.e. the density function , will be defined according to two mutually consistent principles. The first of these principles is the same as Principle 1 for defining the density function of in question that was outlined in Bowater (2019). To avoid repetition of the content of this earlier paper, this principle will not be presented again here.
On the other hand, the second principle for defining the density function that will be advocated in the current paper is distinct in an important respect from Principle 2 for performing this task that was outlined in Bowater (2019). Therefore, even though this second principle is similar to Principle 2 given in Bowater (2019), it now will be presented and discussed in detail. To assist the reader, we will point out that this principle differs from Principle 2 of Bowater (2019) due to an important change that is made to the definition of the conditional density function that is denoted in both the present paper and Bowater (2019) as .
Principle 2 for defining a full conditional fiducial density
To be able to use this principle, the following two conditions must be satisfied.
Let and be, respectively, the sets of all the values of the primary r.v. and the parameter for which the density functions of these variables must necessarily be positive in light of having observed the data . To clarify, any set of values of or any set of values of that are regarded as being impossible after the data have been observed can not be contained in the set or the set respectively.
Given this notation, the present condition will be satisfied if
where is the set of values of the parameter that map on to the value for the variable according to equation (1) if the variable in this equation is substituted by its observed value , and the values , if there are any, are held fixed at their observed values. (To clarify, the predicate in the definition of the set on the right-hand side of equation (2) means ‘there exists a such that ’).
The GPD function must be such that:
Under Conditions 2(a) and 2(b), the full conditional fiducial density is defined by:
i) the post-data density is given by:
in which is a normalising constant and the density is as defined in
step 2 of the algorithm in Assumption 1.
ii) the conditional density function is given by:
in which is the LPD function of as introduced by Definition 4, the function is the likelihood function of given the data , the set is as defined in Condition 2(a), and is a normalising constant, which clearly must depend on the value of .
It can be seen that the density function as defined by equation (3) is formed by marginalising, with respect to , a joint density of the primary r.v. and the parameter that is based on being the conditional density of given , and on being the marginal density of . Also, it is clearly the case that if
then the post-data density will be equal to the pre-data density , i.e. the fiducial density is determined on the basis of the strong fiducial argument, otherwise this fiducial density of is determined on the basis of the moderate fiducial argument. To clarify, in contrast to what was the case under Principle 1 for defining the density outlined in Bowater (2019), the weak fiducial argument is never used to make inferences about .
Furthermore, we can observe that the density function defined in equation (5) is the posterior density of when is conditioned to lie in the set and when the height of the prior density of is proportional to the height of the LPD function of over the set . The role of the LPD function of in constructing the fiducial density is therefore to specify how was distributed a priori over those values of that are consistent with any given value of . For this reason, it is assumed that this LPD function is chosen to reflect what we believed about the parameter before the data were observed. As eluded to in Definition 4, the sets will usually be compact sets that are wholly contained within quite small regions of the real line.
To clarify, the difference between the current definition of the density and the earlier definition of this density function given in Bowater (2019) is that, in effect, this density function is now the posterior rather than the prior density of when is conditioned to lie in the set . This modification therefore allows, in a certain sense, more information from the data to be incorporated into the way the fiducial density is constructed. The issue as to whether this extra information is processed adequately by the method of inference in question will be discussed in the next section.
Finally, it can be appreciated that, if Condition 2(b) is satisfied, then Principle 1 of Bowater (2019) is essentially a special case of the version of Principle 2 that has just been presented. In particular, we are able to see that if the necessary condition to use Principle 1 is satisfied, i.e. Condition 1 of Bowater (2019), then Condition 2(a) will be satisfied, and so if Condition 2(b) also holds, then both conditions required to use Principle 2 will hold. Also, under Condition 1 of Bowater (2019), the density function could be regarded as converting itself into a point mass function at the value , and as a result, the joint density function of and in equation (3) effectively becomes a univariate density function. Therefore, the integration of this latter function with respect to in this equation would be, under Condition 1 of Bowater (2019), naturally regarded as being redundant, and so equation (3) would, in effect, define the fiducial density according to Principle 1 of the earlier paper in question.
Can fiducial and Bayesian inference work together?
By using Principle 2 just outlined to determine the fiducial density , it can be seen that we effectively need to combine fiducial inference with Bayesian inference, and therefore, we may ask whether such a combination of two methods of inference that are quite distinct is acceptable. In this regard, it should first be pointed out that these two methods of inference do not directly interfere with each other since the marginal post-data distribution of is determined by solely using fiducial inference, while the post-data distribution of conditional on is determined by solely using Bayesian inference. Nevertheless, the strong fiducial argument would be naturally invoked when very little or nothing was known about the parameter of interest before the data were observed, which could be taken as meaning that our amount of pre-data knowledge about
should be below a level that would allow this knowledge to be adequately represented by placing a probability distribution over. Therefore, if the strong fiducial argument needs to be invoked in applying Principle 2 to determine the density , this may stand in conflict with the need to express pre-data knowledge about in the form of a LPD function over .
On the other hand, in doing the best we can to apply the Bayesian method to any given problem of inference in which there was very little pre-data knowledge about one of the parameters of interest , it is natural that the inferential procedure should take into account the variation in the posterior distribution of all the model parameters over a wide range of prior distributions for the parameter , each of which may be considered to loosely represent our pre-data knowledge about this parameter. We should note, of course, that if we actually knew truly nothing about the parameter before the data were observed, then, since we would arguably need to perform such a sensitivity analysis over all possible prior distributions for this parameter, our post-data or posterior inferences about the parameter would, in general, be completely uninformative. However, this would not be the case if, in the context of using Principle 2, we performed such a complete sensitivity analysis of the post-data density with respect to the LPD function of as of course this post-data density of is, in general, mainly determined by fiducial inference and not Bayesian inference. Moreover if, in the same context, we had possessed just a very small amount of pre-data knowledge about , then we could use a more constrained version of the type of sensitivity analysis under discussion to take advantage of this knowledge, while at the same time we may well still feel that this knowledge is not important enough for the use of the strong fiducial argument to be made inappropriate.
In summary, a conflict between fiducial inference and Bayesian inference can be avoided when applying Principle 2 by bearing in mind that the kind of pre-data knowledge we would need to have about to make it sensible to invoke either the moderate or strong fiducial argument when using this principle would be such that the variation in the fiducial density should be taken into account over a range of different LPD functions of , each of which may be considered as loosely representing our pre-data knowledge about . This would seem to make it awkward to use Principle 2 to determine the fiducial density , however it will be very often the case that the variation in this fiducial density of that is seen in this kind of sensitivity analysis will be very minor or negligible.
2.4 Extending the theory to the general case
It will be assumed that the theory of organic fiducial inference is extended to the case in which all the parameters in the sampling model are unknown using exactly the same type of approach as described in Bowater (2019), which is essentially the approach that was originally put forward in Bowater (2018). In other words, we first construct each of the fiducial densities in the complete set of full conditional fiducial densities for the parameters , i.e. the set of fiducial densities:
by applying Principle 1 for this type of task (as outlined in Bowater 2019) or Principle 2 (as has just been outlined), or any related principle (such as the ones outlined in Sections 7.2 and 8 of Bowater 2019), and then we determine the joint fiducial density of all the parameters based on these full conditional densities using either an analytical method or an approach based on the Gibbs sampling algorithm that was outlined in Bowater (2018, 2019).
For further details on the extension of organic fiducial inference to the general case under discussion, the reader is therefore referred to the two papers that have just been highlighted.
In the two sections that follow, we will show how Principle 2 as outlined in Section 2.3 can be used to determine the fiducial density of a single parameter of interest in two examples. We will consider first the case where the sampling distribution is binomial, and second the case where this distribution is Poisson. These two examples were examined in Section 6.1 and 6.2 of Bowater (2019) using the older version of Principle 2 that was advocated in that paper. Therefore, it may be helpful to compare the analyses that are about to be presented of these examples with the analyses of the same examples presented in this earlier paper.
3.1 Inference about a binomial proportion
As just mentioned, we will begin by assuming that the sampling distribution is binomial. More precisely, let us consider the problem of making inferences about the population proportion of successes on the basis of observing successes in trials, where the probability of observing any given number of successes is specified by the binomial mass function in this case, i.e. the function:
We will suppose that nothing or very little was known about the proportion before the data were observed.
As clearly the value is a sufficient statistic for the proportion , it can therefore be assumed to be the fiducial statistic in this example. Based on this assumption, equation (1) can be expressed as:
where the primary r.v.
has a uniform distribution over the interval. Under the assumption of there having been no or very little pre-data knowledge about , it is quite natural that the GPD function has the following form: if and zero otherwise, where . Let us point out that for whatever choice is made for the GPD function of and whatever turns out to be the value of , we will never be able to apply Principle 1 of Bowater (2019) to determine the fiducial density of (see Bowater 2019 for more details). On the other hand, equation (7) together with the GPD function for just specified will satisfy Condition 2(a) of Section 2.3 for all possible values of , and since Condition 2(b) will also hold for all , Principle 2 of this earlier section can always be applied to the specific case of current interest. Furthermore, as the condition in equation (6) will also be satisfied, inferences will be made about the proportion under this principle by using the strong fiducial argument.
In particular, by placing the present case in the context of the general definition of the fiducial density given in equations (3), (4) and (5), we obtain the following expression for the fiducial density of :
in which is the set of values of that map on to the value for the primary r.v. according to equation (7) given the observed value of . Of course, to be able to complete this definition of the fiducial density , a LPD function for , i.e. the function , needs to be specified. Observe that any choice for this function that satisfies the loose requirements of Definition 4 and is positive for all values of will lead to a fiducial density that is valid for any and any . Nevertheless, to facilitate the comparison of the current analysis of the example under discussion with the analysis of the same example given in Bowater (2019), we will choose to highlight the same two LPD functions of that were highlighted in this earlier paper, namely the LPD functions of that are defined by:
where , and by:
Finally, we should point out that, although directly computing the density function is quite complicated, random values can, in general, be easily generated from this density function using the same method of sampling from a fiducial density of this type that was relied upon in Bowater (2019). In particular, to obtain one random value from this density function of , we only need to generate a value for the primary r.v. from its post-data density function, i.e. a uniform density over the interval , and then draw a value for the proportion from the conditional density .
To give an illustration of the results that are obtained by generating values from the fiducial density defined by equations (8) and (9), the histograms in Figures 1(a) and 1(b) were each formed on the basis of one million independent random values drawn from this fiducial density of , with being equal to 10 and the observed being equal to one. These values for and are identical to the values that were assigned to these quantities in generating the results that are illustrated by the histograms in Figures 1(a) and 1(b) of Bowater (2019). Similar to this earlier paper, the results conveyed by the histogram in Figure 1(a) of the present paper depend on choosing the LPD function of to be the one given in equation (10), while the results in Figure 1(b) depend on this function being as defined in equation (11).
On the basis of the same data, the dashed curves in these figures represent the posterior density for that (under the Bayesian paradigm) corresponds to the prior density for being a uniform density on the interval , while the solid curves in these figures represent the posterior density for that corresponds to the prior density for being the Jeffreys prior for the case in question, i.e. the prior density for that is proportional to the function of in equation (11).
It can be seen from these figures that, although the posterior density for is highly sensitive to which of the two prior densities for is used, the fiducial density of barely moves depending on whether the LPD function of is proportional to the uniform prior density being referred to, or whether it is proportional to the Jeffreys prior density for this case. Furthermore, we can observe that the two fiducial densities for being considered are both loosely approximated by the posterior density for that is based on the Jeffreys prior density in question, except for values of that are close to the modes of these two fiducial densities.
To give an additional example, the histograms in Figures 2(a) and 2(b) were again each formed on the basis of one million independent random values drawn from the fiducial density , but this time the number of trials was set equal to 20 and the number of successes was set equal to , meaning of course that, as in the previous example, the sample proportion was equal to 0.1. Once more, on the basis of the same data, the dashed curves in these figures represent the posterior density for that corresponds to the prior density for being a uniform density on the interval , while the solid curves in these figures represent the posterior density for that corresponds to the prior density for being the Jeffreys prior for the case in question. The same type of comments can be made about these two figures as were made about Figures 1(a) and 1(b) except that now we can see that the two fiducial densities for in these figures are closely rather than loosely approximated by the posterior density for that is based on the Jeffreys prior density for this case.
On the basis of further simulation results that are not reported here, it would be reasonable to make the conjecture that, for any given sample proportion , the fiducial density will approximate more and more closely the posterior density for that is based on the Jeffreys prior density for this case as the number of trials
gets gradually larger, and that this tendency is not simply due to the effect of the central limit theorem. However, no formal proof of a result of this nature will be given here.
Observe that although the LPD functions in equations (10) and (11) are quite different, the lack of a substantial amount of pre-data knowledge about the proportion may often mean that we should take into account the variation in the fiducial density over a much wider range of LPD functions of before any final conclusions about the precise nature of the post-data distribution of are made. This observation is completely in accordance with the general recommendations concerning the use of Principle 2 for determining any given fiducial density that were outlined in Section 2.3.
3.2 Inference about a Poisson event rate
We now will assume that the sampling distribution is Poisson. More precisely, let us consider the problem of making inferences about an unknown event rate on the basis of observing events over a time period of length , where the probability of observing any given number of events over a period of this length is specified by a function that has the form of a Poisson mass function, in particular the following function:
Again, since the data set to be analysed consists of a single value , this value can be assumed to be the fiducial statistic in this example. Based on this assumption, we can express equation (1) in a way that is similar to how this formula was expressed in equation (7), in particular in the following way:
where again the primary r.v. has a uniform distribution over the interval .
As it will be once more assumed that there was no or very little pre-data knowledge about the parameter of interest, i.e. the event rate in this case, the GPD function will again be specified in the following way: if and zero otherwise, where . Similar also to the previous problem, it can be easily appreciated that the nature of equation (12) implies that Principle 1 of Bowater (2019) can never be applied to determine the fiducial density of for any choice of the GPD function of . However, the specific choice that has been made for this latter function means that again Principle 2 of Section 2.3 can be applied to the case at hand for all possible values of , and furthermore, inferences will be made about under this principle by using the strong fiducial argument.
In particular, by placing the present case in the context of the general definition of the fiducial density given in equations (3), (4) and (5), we obtain the following expression for the fiducial density of :
in which is the set of values of that map on to the value for the primary r.v. according to equation (12) given the observed value of . Of course, similar to the previous problem, a LPD function is required so that the definition of the fiducial density can be completed. Although any choice for this LPD function that conforms to Definition 4 and is positive for all values of will imply that the fiducial density in question is valid for any , let us choose to highlight the consequences of using the two LPD functions for that are defined by:
where , and by:
These two LPD functions of are indeed the same as the two LPD functions of that were highlighted when the current example was analysed in Bowater (2019).
In relation to the issue being discussed, Figures 3(a) and 3(b) each show a histogram that was formed on the basis of one million independent random values drawn from the fiducial density using the same simple simulation method as outlined in Section 3.1, with the observed count assumed to be equal to 2, and with the LPD functions of that underlie the results conveyed by the histograms in these two figures being defined by equations (13) and (14) respectively. On the basis also of being equal to 2, the dashed curves in these figures represent the posterior density for that corresponds to the prior density for being the function of given in equation (13), while the solid curves in these figures represent this posterior density when the prior density for is the function of given in equation (14), i.e. the Jeffreys prior for the case in question. It should be pointed out that the use of these two prior densities is controversial as they are both improper.
We can see from Figures 3(a) and 3(b) that, although the posterior density for the event rate is highly sensitive to which of the two prior densities for is used, there is almost no difference in the fiducial density of depending on which of the two LPD functions of is used. Also, similar to what was the case for the two fiducial densities of in Figures 2(a) and 2(b), the two fiducial densities of represented in these figures are both closely approximated by the posterior density of that is based on the Jeffreys prior density for the problem of interest.
Finally, we should place a minor caveat on this analysis that is similar to one that was placed on the analysis of the example in the previous section, by observing that it may often be appropriate to assess the variation in the fiducial density over a wider range of LPD functions of than is represented by the two LPD functions in equations (13) and (14) before any final conclusions about the precise nature of the post-data distribution of are made.
3.3 Inference about a multinomial distribution
The version of Principle 2 outlined in Section 2.3, i.e. the present version of this principle, could be applied to the same problem of making post-data inferences about the parameters of a multinomial distribution that was discussed in Section 6.3 of Bowater (2019). However, the results of this analysis would be very similar to the results that were obtained in Bowater (2019) when the older version of Principle 2 that was proposed in this earlier paper was applied to the problem of inference in question. To be clear, it is not being asserted that the results of these two analyses would be consistent with each other. Nevertheless, the differences between the two sets of results being referred to would simply come down to how a sensitivity analysis of the joint fiducial density of the parameters concerned over different LPD functions of these parameters is affected by whether the density function that is used in the definition of Principle 2 is, in effect, a conditional prior density or whether it is the conditional posterior density that corresponds to this prior density. Therefore, for the sake of wishing to avoid too much repetition of an earlier analysis of the problem of making inferences about multinomial proportions that was presented in Bowater (2019), a re-analysis of this problem using the methodology of the current paper will not be presented here.
4 What has been achieved?
As well as clarifying to some extent how a fiducial statistic is defined when applying the theory of organic fiducial inference to the most general type of inferential problems, this paper has presented a modification to Principle 2 for defining the fiducial density relative to how this principle was specified in Bowater (2019), and it has been shown how this modification affects analyses of problems of inference that were originally examined in this earlier paper.
Bowater, R. J. (2018). Multivariate subjective fiducial inference. arXiv.org (Cornell University), Statistics, arXiv:1804.09804.
Bowater, R. J. (2019). Organic fiducial inference. arXiv.org (Cornell University), Statistics, arXiv:1901.08589.
Bowater, R. J. (2020). Integrated organic inference (IOI): a reconciliation of statistical paradigms. arXiv.org (Cornell University), Statistics, arXiv:2002.07966.