Organic Fiducial Inference

01/23/2019 ∙ by Russell J. Bowater, et al. ∙ 0

A substantial generalization is put forward of the theory of subjective fiducial inference as it was outlined in earlier papers. In particular, this theory is extended to deal with cases where the data are discrete or categorical rather than continuous, and cases where there was important pre-data knowledge about some or all of the model parameters. The system for directly expressing and then handling this pre-data knowledge, which is via what are referred to as global and local pre-data functions for the parameters concerned, is distinct from that which involves attempting to directly represent this knowledge in the form of a prior distribution function over these parameters, and then using Bayes' theorem. In this regard, the individual attributes of what are identified as three separate types of fiducial argument, namely the strong, moderate and weak fiducial arguments, form an integral part of the theory that is developed. Various practical examples of the application of this theory are presented, including examples involving binomial, Poisson and multinomial data. The fiducial distribution functions for the parameters of the models in these examples are interpreted in terms of a generalized definition of subjective probability that was set out previously.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The theory of subjective fiducial inference was first proposed in Bowater (2017b), and was then modified and extended to deal with more general inferential problems in which various parameters are unknown in Bowater and Guzmán (2018a). A further analysis that supports the adoption of this approach to inference is provided in Bowater and Guzmán (2018b). References to loosely related work in the general area of fiducial inference can be found in the first two of these three papers.

The aim of the present work is to substantially generalize this theory of inference as it was defined in Bowater and Guzmán (2018a). In particular, this theory will be extended to deal with cases where the data are discrete or categorical rather than continuous, and cases where there was important knowledge about some or all of the model parameters before the data were observed. Such knowledge, which will be termed ‘pre-data knowledge’, will be treated as being distinct from ‘prior knowledge’, since the use of this latter term usually exclusively implies that inferences will be made under the Bayesian paradigm.

The development of the earlier theory will be at a level that is sufficient to justify the theory being renamed as ‘organic fiducial inference’. Also, the use of the word ‘subjective’ in the original name caused confusion as for some this meant that the theory must substantially depend on personal beliefs, or in some other way must be far from being objective. As was explained in Bowater and Guzmán (2018a, 2018b), this was not the case for the original theory, and is not generally the case for the theory that is about to be presented. The word ‘organic’ in the new name, however, still emphasizes that the theory is designed for living subjects, e.g. humans, and not for robots.

For cases in which nothing or very little was known about the model parameters before the data were observed, the motivation for this paper is similar to how the need for the work in Bowater and Guzmán (2018a) was justified, that is, it is motivated by the severe criticisms that generally can be made in these cases against the frequentist and Bayesian approaches to inference. These criticisms, some of which are well known, were set out in Section 4 of Bowater (2017b) and Sections 2 and 7 of Bowater and Guzmán (2018a), and to save space they will not be repeated here.

In other cases that will be of interest, i.e. where there is moderate to strong pre-data knowledge about some or all of the model parameters, conventional schools of inference can also be inadequate. In particular, frequentist theory is a generally inflexible framework for incorporating such knowledge into the inferential process. For example, it has proved, on the whole, very difficult to adapt the confidence interval approach to situations where we simply know, before the data was observed, that values in a given subset of the natural space of a parameter of interest are impossible, see for example Mandelkern (2002) and the references therein. On the other hand, while our pre-data knowledge about some or all of the model parameters may be substantial, it may not be comprehensive enough in many situations to be adequately incorporated into a Bayesian analysis by placing a prior density function over the parameters in question.

Let us now summarize the structure of the paper. Some brief comments about the concept of probability that will be used in the paper are made in the following section. Further concepts, principles and definitions that underlie the theory of organic fiducial inference in cases where only one model parameter is unknown are presented and discussed in Section 3. In relation to earlier work, an account is then given in Section 4 of how this methodology is extended to include cases where various parameters are unknown.

In the second half of the paper, the theory is applied to various examples. In particular in Sections 5 and 6, problems of inference based on both continuous and discrete data are examined where nothing or very little was known about the model parameters before the data were observed. Examples are then discussed in Section 7 where the natural parameter space is restricted as a result of pre-data knowledge about the case in question, and finally in Section 8, the impact of more general forms of pre-data knowledge about the model parameters is illustrated.

2 Generalized subjective probability

The definition of probability upon which the theory of organic fiducial inference will be based is the definition of subjective probability that was presented in Bowater and Guzmán (2018b), although the key concept of similarity that this definition relies on was introduced in Bowater (2017a), and discussed in Bowater (2017b) and Bowater and Guzmán (2018a). For the sake of convenience, this definition of probability will be referred to as generalized subjective probability.

Under this definition, a probability distribution is defined by its distribution function, which has the usual mathematical properties of such a function, and the

strength of this function relative to other distribution functions of interest. In very loose terms, the strength of a distribution function is essentially a measure of how well the distribution function represents a given individual’s uncertainty about the random variable concerned. In this paper, we will be primarily interested in the external strength of a continuous distribution function as specified by Definitions 5 and 7 of Bowater and Guzmán (2018b). To avoid repeating all the technical details, the reader is invited to examine these definitions as well as the application of these definitions to a fundamental problem of statistical inference in Sections 3.6 and 3.7 of this earlier paper.

Although generalized subjective probability will be the adopted definition of probability, the concept of strength will not be explicitly discussed in the sections that immediately follow in order to give a more digestible introduction to the other main concepts that underlie organic fiducial inference. Instead, the role of this definition of probability in organic fiducial inference will be fully examined when this method of inference is applied to examples later in the paper.

3 Univariate organic fiducial inference

3.1 Sampling model and data generation

In general, it will be assumed that a sampling model that depends on one or various unknown parameters generates the data . Let the joint density or mass function of the data given the true values of be denoted as

. For the moment, though, we will assume that the only unknown parameter in the model is

, either because there are no other parameters in the model, or because the true values of the parameters in the set are known.

In a change from Bowater and Guzmán (2018a), the following more general definition of a fiducial statistic will be applied.

Definition 1: A fiducial statistic

A fiducial statistic

will be defined as being the only statistic in a sufficient set of one-dimensional statistics that is not an ancillary statistic. Of course, given this requirement, there may not exist any possible choice for this kind of statistic. However, in this paper, we will only consider cases where this definition can be successfully applied. In other cases, a way of defining the fiducial statistic is to allow it to be any one-to-one function of a unique maximum likelihood estimator of

. This latter criterion was applied in Section 5.7 of Bowater and Guzmán (2018a).

We will also make a more general assumption about the way in which the data were generated than in this earlier paper.

Assumption 1: Data generating algorithm

Independent of the way in which the data were actually generated, it will be assumed that the data set was generated by the following algorithm:

1) Simulate the values of the ancillary complements, if any exist, of a given fiducial statistic .

2) Generate a value for a continuous one-dimensional random variable , which has a density function that does not depend on the parameter .

3) Determine a value for the fiducial statistic by setting equal to and equal to in the following expression, which effectively defines the distribution function of :

(1)

where the function is defined so that it satisfies the following conditions:

Assumption 1.1: Conditions on the function

a) The distribution function of as defined by the expression in equation (1) is equal to what it would have been if had been determined on the basis of the data set .
b) The only random variable upon which depends is the variable .

4) Generate the data set by conditioning the sampling density or mass function on the already generated value for and the values of any ancillary complements of .

Observe that Assumption 1.1 differs from the corresponding assumption in Bowater and Guzmán (2018a) due to the absence of a condition that is similar to condition (c) of Assumption 1.1 in this earlier paper.

In the context of the above algorithm, the variable will be referred to as a primary random variable (primary r.v.), which is consistent with how this term was used in Bowater (2017b) and Bowater and Guzmán (2018a, 2018b). To clarify, if this algorithm was rewritten so that the value of the variable was generated by setting it equal to a deterministic function of an already generated value for and the parameter , then would not be a primary r.v.

3.2 Types of fiducial argument

Although the fiducial argument is usually considered to be a single argument, in this section we will clarify and develop the argument by breaking it down into three separate but related sub-arguments.

Definition 2(a): Strong or standard fiducial argument

This is the argument that the density function of the primary r.v.  after the data have been observed, i.e. the post-data density function of , should be equal to the pre-data density function of , i.e. the density function as defined in step 2 of the algorithm in Assumption 1. In the case where nothing or very little was known about the parameter before the data were observed, justifications for this argument, without using Bayesian reasoning, were outlined in Section 3.1 of Bowater (2017b), Section 6 of Bowater and Guzmán (2018a) and Section 3.6 of Bowater and Guzmán (2018b), and therefore will not be repeated here.

Definition 2(b): Moderate fiducial argument

This type of fiducial argument will be assumed to be only applicable to cases where values of the primary r.v.  that were possible before the data were observed, i.e. values in the set , are made impossible by the act of observing the data. Under this condition, it is the argument that, over the set of values of that are still possible given the data, the relative height of the post-data density function of should be equal to the relative height of the pre-data density function of .

It is an argument that can be certainly viewed as being less attractive than the strong fiducial argument as its use implies that our beliefs about will be modified by the data. Nevertheless, it will be made clear in Section 7.1 how this argument can be adequately justified without using Bayesian reasoning in an important class of cases.

Definition 2(c): Weak fiducial argument

This argument will be assumed to be only applicable to cases where the use of neither the strong nor the moderate fiducial argument is considered to be appropriate. It is the argument that, over the set of values of the primary r.v.  that are possible given the data, the relative height of the post-data density function of should be equal to the relative height of the pre-data density function of multiplied by weights on the values of that are determined from a function over the parameter that was specified before the data were observed. The precise way in which these weights over the values of are formed will be defined in Section 3.4.

Similar to the strong and moderate fiducial arguments, this type of fiducial argument can be adequately justified without using Bayesian reasoning in many important cases. Such a justification and examples of the cases in question will be presented in Section 8.

3.3 Expressing pre-data knowledge about

In the theory of organical fiducial inference, it will be assumed that pre-data knowledge about the parameter is expressed through what will be called a global pre-data function and a local pre-data function for , which have the following definitions.

Definition 3: Global pre-data (GPD) function

The global pre-data (GPD) function is any given non-negative and locally integrable function over the space of the parameter . It is a function that only needs to be specified up to a proportionality constant, in the sense that if it is multiplied by a positive constant, then the value of the constant is redundant. If for all where is a given subset of the real line, then this implies that it was regarded as being impossible that before the data were observed. Unlike a Bayesian prior density, it is not controversial to use a GPD function that is not globally integrable.

In many cases, the GPD function will have the following simple form:

(2)

where the set may be empty and is a constant that has, of course, a redundant value. This GPD function will be called the neutral GPD function.

Definition 4: Local pre-data (LPD) function

The local pre-data (LPD) function is a function of the parameter that has the same mathematical properties as the GPD function, i.e. it is a non-negative and locally integrable function over the space of that only needs to be specified up to a proportionality constant. Its role is to complete the definition of the joint post-data density function of the primary r.v.  and the parameter in cases where using either the strong or moderate fiducial argument alone is not sufficient to achieve this. For this reason, the LPD function is in fact redundant in many situations. We describe this function as being ‘local’ because it is only used in the inferential process under the condition that equals a specific value, and with this condition in place and given the data , the parameter usually must lie in a compact set that is contained in a very small region of the real line. It will be seen that because of this, even if the LPD function is not redundant, its influence on the inferential process will usually be relatively minor.

3.4 Univariate fiducial density functions

Given the data , the fiducial density function of the parameter conditional on the parameters in the set being known, i.e. the density function , will be defined according to the following two compatible principles.

Principle 1 for defining the fiducial density

This principle requires that the following condition is satisfied.

Condition 1

Let and be the sets of all values of and respectively that are possible given the value of the fiducial statistic and its ancillary complement, if it exists, that are calculated on the basis of the data . In defining these sets, it is assumed that values of that were regarded as being impossible before the data were observed can not be made possible by observing the data. Given this notation, the present condition is satisfied if, on substituting the variable in equation (1) by the value , this equation would define a bijective mapping between the set and the set .

Under Condition 1, the fiducial density function is defined by setting equal to in equation (1), and then treating the value as being a realization of the random variable , to give the expression:

except that, instead of necessarily having the density function as defined in step 2 of the algorithm in Assumption 1, it will be assumed to have the following density function:

(3)

where is the value of that maps on to the value , the function is the GPD function as introduced by Definition 3, and is a normalizing constant.

The function will be regarded as being the post-data density function of  . Also, in the definition of the weak fiducial argument, i.e. Definition 2(c), the function over that is used to determine the weights on values in the construction of this density function for will now be identified as being the GPD function.

Observe that if the GPD function is neutral, i.e. it has the form given in equation (2), then over the set , the density will be equal to the pre-data density conditioned to lie in this set. For this type of GPD function, if

(4)

then clearly the procedure for making inferences about will depend on the strong fiducial argument, otherwise it will depend on the moderate fiducial argument. Alternatively, if the GPD function is not equal to a positive constant over the set , then we can see that inferences about will be made by using the weak fiducial argument.

Furthermore notice that if, on substituting the variable by the value , equation (1) defines an injective mapping from the set to the space of the parameter , then the GPD function expresses in effect our pre-data beliefs about relative to what is implied by the strong fiducial argument. By doing so, it determines whether the strong, moderate or weak fiducial argument is used to make inferences about , and also the way in which the latter two arguments influence the inferential process.

In this respect, under the same assumption concerning equation (1), it can be seen that if the pre-data density is a uniform density for over , which in theory can be always arranged to be the case by appropriate choice of the variable , and we define

where and are non-empty subsets of the interval such that the events and are assigned the same probability by the density , then assuming that is not zero, the probability of the event will be times the probability of the event after the data have been observed.

Finally, it should be noted that in the theory of subjective fiducial inference as outlined in Bowater and Guzmán (2018a), the density is effectively always defined to be equal to the density , i.e. the only type of fiducial argument that this earlier theory relies on is the strong fiducial argument.

Principle 2 for defining the fiducial density

This principle requires that the following two conditions are satisfied.

Condition 2(a)

Given the value for the variable , it is required that,

where and are as defined in Condition 1, and is the set of values of that map on to the value according to equation (1).

Condition 2(b)

The GPD function must be equal to a positive constant over the set .

Under Conditions 2(a) and 2(b), the fiducial density function is defined by

(5)

where is as defined in equation (3), although will always be equal to a positive constant in this equation, and the conditional density function is defined by

(6)

where is the LPD function as introduced by Definition 4, and is a normalizing constant, which clearly must depend on the value of .

It can be seen that the density function as defined by equation (5) is formed by marginalizing, with respect to , a joint density of and that is based on being the conditional density of given , and on being the marginal density of . Similar to what was the case under Principle 1, if the condition in equation (4) is satisfied, then the density will be equal to the density , i.e. the density function is determined on the basis of the strong fiducial argument, otherwise it is determined on the basis of the moderate argument. However, in contrast to what was the case under Principle 1, the weak fiducial argument is never used to make inferences about .

Also, we can observe that the density function defined in equation (6) is formed by normalizing the LPD function after has been restricted to lie in the subset . The role of the density is therefore to make use of the nature of the LPD function to distribute over those values of that are consistent with any given value of . For this reason, it is assumed that the LPD function is chosen to reflect what we believe about . In particular, these beliefs are assumed to be our pre-data rather than post-data beliefs about , as otherwise it is evident that, in general, we would be guilty of making inferences about by using the data twice. As eluded to in Definition 4, the sets will in general be compact sets that are usually wholly contained within very small regions of the real line.

Furthermore, it is worth noting that if Condition 2(b) is satisfied, then Principle 1 is essentially a special case of Principle 2. This is because to apply Principle 1 it is required that Condition 1 holds, and if it does then, first, Condition 2(a) must hold, second, the density could be regarded as converting itself into a point mass function at the value , and third, as a result of this, the joint density function of and in equation (5) effectively becomes a univariate density function. Therefore, the integration of this latter function with respect to would be naturally regarded as being redundant.

As a final point, we need to acknowledge the fact that important cases exist in which neither Condition 1 is satisfied nor Conditions 2(a) and 2(b) are both satisfied. If Condition 2(a) does not hold, then we have a problem that could be described as ‘spillage’ due to the fact that the set will be a proper subset of , and therefore this latter set ‘spills out’ of the set . How to deal with this problem of spillage will be returned to in Section 7.2, and how to deal with cases where Condition 2(b) does not hold will be discussed in Section 8.

4 Multivariate organic fiducial inference

We will now consider the case where all the parameters in the sampling model are unknown.

Definition 5: Joint fiducial density functions

Under the assumption that Principles 1 or 2, or any natural variations on these principles, can be used to define the full conditional fiducial densities

(7)

and that this set of conditional densities determine a joint density function for the parameters , this latter density function will be defined as being the joint fiducial density function of these parameters, and will be denoted as . It can be easily shown that this density function will always be unique.

To corroborate that the set of full conditional densities in equation (7) actually determine a joint density function for the parameters concerned, the analytical or the computational method that were proposed for this purpose in Bowater and Guzmán (2018a) could be applied. These methods will now be briefly described.

An analytical method

Under the assumption that the set of full conditional densities in equation (7) can be expressed analytically, a way of establishing whether they determine a joint density function for is simply to propose an analytic expression for such a density function, derive the full conditional densities of the proposed density function, and see if they match the full conditionals in equation (7).

A computational method

A more general method for establishing whether the full conditional densities in equation (7) determine a joint density function for the parameters concerned is based on attempting to generate random samples from this joint density by applying the Gibbs sampler (Geman and Geman 1984 and Gelfand and Smith 1990) to the full conditionals in question. Of course, the Gibbs sampler, assuming that it is ergodic, will only converge to a unique stationary density if the joint density actually exists (and the reverse is also true). For this reason, we now choose to redefine the problem as being one of trying to establish whether the Gibbs sampler converges to a unique stationary density on the basis of the observed behaviour of this sampler.

This may also seem to be a difficult problem to resolve. However, in a more conventional application of the Gibbs sampler, we are faced with the similar problem of whether the sampler converges to its unique stationary density in a reasonable amount of time, i.e. before a large pre-specified number of cycles of the sampler have been completed. This is the reason why a substantial number of techniques have been developed to assess whether Monte Carlo Markov chains, such as the Gibbs sampler, converge to their unique stationary densities within a given finite number of cycles, see for example Gelman and Rubin (1992) and Brooks and Roberts (1998).

Obviously, if there is the added complication that we are not completely sure that the Gibbs sampler has a unique stationary density, then it would seem appropriate that we use these convergence diagnostics more intensively. On the whole though, if in the context of having already taken into account how the full conditional densities in equation (7) were formed, the use of such diagnostics can give us a high degree of confidence that the Gibbs sampler has converged to a unique stationary density, then of course we should have a high degree of confidence that the joint fiducial density does indeed exist.

An important benefit of using the Gibbs sampling method that has just been described is that to calculate expectations of interest with respect to this joint fiducial density, we will often need to rely on simulation methods such as the Gibbs sampler. Therefore by using this Gibbs sampling method, two goals can be achieved simultaneously.

5 An example with continuous data and little pre-data knowledge

We will now apply the methodology put forward in the previous sections to some examples. To begin with, let us consider the standard problem of making inferences about the mean

of a normal density function, when its variance

is unknown, on the basis of a sample of size , i.e. , drawn from the density function concerned. Although the way in which the theory of subjective fiducial inference can be used to solve this problem was detailed in Bowater and Guzmán (2018a), let us quickly place this problem in the context of the type of inference that is the subject of the present paper, i.e. organic fiducial inference.

If is known, a sufficient statistic for is the sample mean , which therefore can be assumed to be the fiducial statistic . Based on this assumption, equation (1) can be expressed as

(8)

where . Under the assumption that nothing or very little was known about before the data were observed, it is quite natural to specify the GPD function for as follows: , , where . Furthermore, since equation (8) will always satisfy Condition 1, the fiducial density can be always determined by Principle 1. In particular, as the GPD function is neutral, and the condition in equation (4) will be satisfied, the fiducial density in question is derived under this principle by applying the strong fiducial argument. As a result, it can be easily shown that this fiducial density is defined by

On the other hand, if is known, a sufficient statistic for is which will be assumed to be . Based on this assumption, equation (1) can be expressed as

where . Under the assumption of no or very little pre-data knowledge about , it is quite natural to specify the GPD function for as follows: if and 0 otherwise, where . Furthermore, we can see that Principle 1 will be again always applicable, and as the GPD function is neutral and the condition in equation (4) will be satisfied, the fiducial density is derived under this principle by again calling on the strong fiducial argument. As a result, it can be easily shown that this fiducial density is a scaled inverse density function with degrees of freedom and scaling parameter equal to .

Finally, by using the analytical method outlined in Section 4, it can be easily established that the conditional density functions and that have just been defined determine a joint fiducial density for and , and by integrating over this joint density function, it can be deduced that the marginal fiducial density for is defined by

(9)

where

is the sample standard deviation, i.e. it is the well-used non-standardised Student

density function with degrees of freedom, location parameter equal to and scaling parameter equal to .

The full conditional fiducial densities for many other problems of inference are naturally obtained in a similar way, i.e. under Principle 1, with a neutral GPD function and applying the strong fiducial argument. For example, the full conditional fiducial densities that were put forward in all the applications of subjective fiducial inference that were discussed in Bowater and Guzmán (2018a) can be derived either exactly or approximately under the same assumptions.

Let us now turn to the issue of how to interpret the joint fiducial density functions that can be derived under these assumptions in terms of the framework of generalized subjective probability, i.e. the definition of probability outlined in Bowater and Guzmán (2018b). In accordance with what was explained back in Section 2, to complete the definition of any fiducial or posterior distribution, within this framework, we require both the distribution function of the variables concerned, and an assessment of the external strength of this function relative to other distribution functions of interest.

With regard to the main example of the present section, a detailed evaluation of the external strength of the fiducial distribution function of given , i.e.  was provided in Bowater and Guzmán (2018b). In particular, it was shown how it can be argued that if the compound events in the reference set (using the notation of this earlier paper) are made up of the outcomes of a well-understood physical experiment, e.g. the positions of a wheel after it has been spun, then, for any resolution , the relative external strength of the distribution function should be judged as being at a level that is close to the highest attainable level. On the basis of the arguments presented in Bowater and Guzmán (2018a, 2018b), the same conclusion can be reached about the relative external strength of the fiducial distribution function of given , i.e. .

Since the joint fiducial distribution function of and is fully defined by two distribution functions, namely and , that, under the assumptions that have been made about the reference set and the resolution

, can both be argued as being externally very strong, then under the same assumptions, it can be argued that this joint distribution function should also be regarded as being externally very strong. In loose terms, this means that the joint distribution of

and in question should be regarded as being close in nature to the kind of probability distribution that would be placed over the outcomes of the physical experiment on which the reference set is based. By generalizing the same line of reasoning (see Bowater and Guzmán 2018a for clarification), similar conclusions can be reached about the relative external strengths of the joint fiducial distribution functions that can be derived for other problems that satisfy the criteria of the cases that have been considered in the present section, e.g. the problems discussed in Bowater and Guzmán (2018a).

6 Examples with discrete data and little pre-data knowledge

In this section, organic fiducial inference will be applied to examples in which the data are discrete, and where nothing or very little was known about the model parameters before the data were observed.

6.1 Inference about a binomial proportion

First, let us consider the problem of making inferences about the population proportion of successes on the basis of observing successes in trials, where the probability of observing any given number of successes follows the usual definition of the binomial mass function as specified by:

As clearly the value is a sufficient statistic for , it will therefore be assumed to be the fiducial statistic . Based on this assumption, equation (1) can be expressed as

(10)

where . Under the assumption of no or very little pre-data knowledge about , it is again quite natural that the GPD function has the following form: if and otherwise, where . This time, though, since equation (10) will never satisfy Condition 1 for any choice of the GPD function and for any value of , we can never apply Principle 1. On the other hand, this equation together with the specified GPD function will satisfy Condition 2(a) for all possible values of , and since Condition 2(b) will also hold for all , Principle 2 can always be applied. Furthermore, as the condition in equation (4) will also be satisfied, inferences will be made about under this principle by using the strong fiducial argument.

As a result, by placing the present case in the context of the general definition of the fiducial density given in equations (5) and (6), we obtain the following expression for the fiducial density :

(11)

where

(12)

Of course, to be able to complete this definition, a LPD function needs to be specified. Observe that any choice for this function that satisfies the very loose requirements of Definition 4 will lead to a fiducial density that is valid for any and any . Nevertheless, to provide two practical examples, we will choose to highlight the two LPD functions that are defined by

(13)

and by

(14)

For both these LPD functions and in general, it will not be possible to obtain a closed-form expression for the fiducial density for any given value of . However, drawing random values from this density function will be generally fairly straightforward.

In this respect, the histograms in Figures 1(a) and 1(b) were each formed on the basis of 500,000 independent random values drawn from the density function , with being equal to 10 and the observed being equal to 1. The results in Figure 1(a) depend on choosing the LPD function to be the one given in equation (13), while the results in Figure 1(b) depend on this function being as defined in equation (14). The dashed-line curves in these figures represent the posterior density for that corresponds to the prior density for being uniform on , while the solid-line curves in these figures represent the posterior density for that corresponds to the prior density for being the Jeffreys prior for the case in question, i.e. the density function that is proportional to the function for in equation (14).

It can be seen from these figures that, although the posterior density for is highly sensitive to which of the two prior densities is used, the fiducial density for barely moves depending on whether the LPD function is proportional to the uniform prior, or whether it is proportional to the Jeffreys prior for this case. Moreover, we can observe that the two fiducial densities for in question both closely approximate the posterior density for that is based on this Jeffreys prior.

Similar to the previous section, let us now turn to the issue of how to interpret the fiducial density in terms of the framework of generalized subjective probability. It will be assumed that the reference set and the range of the resolution are as defined in this earlier section.

To begin with, on the basis of the lines of reasoning presented in Bowater and Guzmán (2018a, 2018b), it can be argued that the relative external strength of the distribution function that corresponds to the post-data density of the primary r.v. , i.e. the uniform density function , should be judged as being at a level that is close to the highest attainable level, which loosely means that arguably this density should be an extremely good representation of our post-data beliefs about . On the other hand, given that it is being assumed that we have no or very little pre-data knowledge about , it will not be easy to find an LPD function that adequately represents our pre-data beliefs about . Therefore, it would be expected that similar to any prior distribution function that could be chosen for in this type of situation, the distribution functions that correspond to the conditional densities defined in equation (12) would be judged as being externally quite weak.

Nevertheless, since these latter distribution functions are defined over intervals for that will be generally much shorter than the interval for over which the prior distribution function for must be defined, i.e. the interval , it would be expected that, on the whole, they would be regarded as being externally much stronger than any given prior distribution function for . Moreover, since in cases where is not very small and is not equal to 0 or , the role of the LPD function could be described as being heavily subordinate to the role of the density in the construction of the joint density of and in equation (11), it can be argued that, in these cases, the distribution function that corresponds to the fiducial density should be regarded as being externally very strong. In loose terms, this means that the fiducial probability of lying in any given interval of moderate width should be regarded as being close in nature to the probabilities of the events contained in the reference set .

By contrast, since the posterior density for is effectively obtained through Bayes’ theorem by simply reweighting the prior density for , that is, by normalizing the density function that results from multiplying this prior density function by the likelihood function, it would seem difficult to use a form of a reasoning that is compatible with the Bayesian paradigm, to argue that the relative external strength of the posterior distribution function for should not be heavily dependent on the relative external strength of the prior distribution function for , which as already mentioned would be expected to be externally quite weak.

Figure 1: Samples from the organic fiducial density of a binomial proportion

6.2 Inference about a Poisson rate parameter

We will now consider the problem of making inferences about an unknown event rate on the basis of observing events over a time period of length , where the probability of observing any given number of events over a period of this length follows the usual definition of the Poisson mass function as specified by:

Again, since the data set to be analysed consists of a single value , this value will be assumed to be the fiducial statistic . Based on this assumption, equation (1) can be expressed in a way that is similar to equation (10), i.e.

(15)

where .

As it will be assumed that there was no or very little pre-data knowledge about , the GPD function will again be specified in the following way: for and otherwise, where . Similar also to the previous problem, the nature of equation (15) means that Principle 1 can never be applied for any choice of the GPD function, but the particular choice that has been made for this latter function means that Principle 2 can always be applied, and in particular, inferences will be made about under this principle by using the strong fiducial argument.

As a result, expressions that define the fiducial density are identical to the expressions in equations (11) and (12) except that the proportion is replaced by the event rate . Although any choice for the LPD function that conforms to Definition 4 will imply that this fiducial density is valid for any , let us choose to highlight the consequences of using the two LPD functions that are defined by

(16)

and by

(17)

In this regard, Figures 2(a) and 2(b) each show a histogram that was formed on the basis of 500,000 independent random values drawn from the density function , under the assumption that two events were observed over a given period of length , i.e. , with the LPD functions that underlie the results in these two figures being defined by equation (13) and by equation (14) respectively. In these figures, the dashed-line curves represent the posterior density for that corresponds to the prior density for being the function for in equation (16), while the solid-line curves represent this posterior density when the prior density for is the function for in equation (17), i.e. the Jeffreys prior for the case in question. Observe that the use of these two prior densities for is controversial as they are both improper.

It is evident that there is almost no difference between the two histograms in Figures 2(a) and 2(b), and as was the case for the histograms in Figures 1(a) and 1(b), they are both closely approximated by the posterior density that is based on the Jeffreys prior for the problem of interest. Furthermore, using a very similar line of reasoning to the one that in Section 6.1 was used to argue that, under certain assumptions, the distribution function that corresponds to the fiducial density should be regarded as being externally very strong, it can also be argued, under the same assumptions about the set and the resolution , that if , the distribution function that corresponds to the fiducial density of current interest, i.e. , should also be regarded as being externally very strong.

Figure 2: Samples from the organic fiducial density of a Poisson event rate

6.3 Inference about multinomial proportions

To conclude this section, let us consider the problem of making inferences about the population proportions of all the outcomes of an experiment, where is the proportion of times outcome is generated by the experiment, based on observing any given sample of counts of these outcomes, where is the number of times outcome is observed, and the probability of observing this sample followed the usual definition of the multinomial mass function as specified by:

Given that , let us define the complete set of model parameters as being the set . Now, if it is assumed that all the proportions in this set are known except , a set of sufficient statistics for would be . However, is an ancillary statistic, and therefore according to Definition 1, it can be assumed that is the fiducial statistic . Under this assumption, and taking into account that the quantity is known, it is convenient to express the definition of the conditional fiducial density , where , in terms of the fiducial density , where . This is because the definition of this latter fiducial density is equivalent to the definition of the fiducial density in equations (11) and (12) except that , and in this earlier definition are substituted by , and respectively.

In this way, the set of full conditional fiducial densities for this problem can be determined, i.e. the set

(18)

On the basis of having done this, the histograms in Figures 3(a)-(d) summarize a sample of two million realizations of all the parameters of a multinomial distribution function with

that was obtained by excluding an initial burn-in sample of 500 of such random vectors from one run of a Gibbs sampler applied to this set of full conditional densities. The sample of counts

was , and to complete the definition of these conditional fiducial densities, the LPD functions concerned, i.e.  were all chosen to have the form of the LPD function given in equation (13). The Gibbs sampler in question was also run various times more from different starting points, and the results provided no evidence to suggest that the sampler was failing to converge to a unique stationary density function. Therefore, it would seem reasonably safe to assume that the full conditional densities in equation (18) determine a joint fiducial density for the parameters concerned, and we have succeeded in generating a series of random vectors from this density function.

The solid-line curves in Figures 3(a)-(d) represent the marginal posterior densities for each of the parameters , , and respectively when the joint prior density for these parameters is the Jeffreys prior for the case in question, i.e. a symmetric Dirichlet density with concentration parameter equal to 0.5. On the other hand, the long-dashed and short-dashed curves in these figures represent these marginal posterior densities when the joint prior density concerned is, respectively, a uniform density and the Perks prior density, i.e. a symmetric Dirichlet density with equal to . For any given value of , the use of the uniform prior density was advocated for example by Tuyl (2017), while the use of the Perks prior density was advocated for example by Berger et al. (2015).

It can be seen that the histograms for the proportions , and in Figures 3(b)-(d) are closely approximated by the marginal posterior densities corresponding to each of these parameters when the joint prior density is the Jeffreys prior for this case, whereas the histogram for the proportion in Figure 1(a) is only loosely approximated by the marginal posterior density for derived on the basis of this prior density. Also, the covariances between all the proportions , except those involving the parameter , were found to be very similar between the joint fiducial density and the joint posterior density in question. Furthermore, additional simulations showed that the joint fiducial density in this example was not very sensitive to the choice of the LPD functions concerned, i.e. .

Before proceeding let us assume that the reference set and the resolution are defined as in previous sections. Now, given the natural relationship that exists between any of the full conditional densities in equation (18) and the fiducial density for a binomial proportion defined in equations (11) and (12), a similar line of reasoning to one outlined in Section 6.1 can be used to argue that the distribution functions that correspond to the densities in equation (18) should all be regarded as being externally very strong provided that , and is not very small for all values of . The first of these conditions of course does not apply in the case where in the example that has been highlighted, but this example was not chosen to represent the most ideal scenario. Furthermore, since the joint fiducial distribution function of all the proportions is fully defined by the full conditional densities in equation (18), a similar line of reasoning to one mentioned in Section 5 can be used to argue that this joint distribution function should also be regarded as being externally very strong provided that the aforementioned conditions on the counts hold, and the total count is not very small relative to the number of proportions .

Finally, it needs to be taken into account that the joint fiducial distribution function in question is potentially sensitive to which of the population proportions is defined to be the proportion . However, extensive simulations that were conducted showed that the effect of this choice of parameterization was generally negligible, and was only found to be slightly more than negligible in certain cases where the total count was less than the number of proportions . Moreover, this issue can be easily resolved by always applying the criterion of designating the proportion so that its corresponding count is the highest or equal highest out of all the counts . As the count is always one of the two counts that are used to form each of the full conditional fiducial densities in equation (18), this criterion is justifiable from a statistical viewpoint, and it also guarantees that the case is avoided where the count , and at least one of the remaining counts equals zero, which would imply that at least one of these conditional fiducial densities is undefined.

Figure 3: A sample from a joint organic fiducial density of multinomial proportions obtained using the Gibbs sampler

7 Examples with restricted parameter spaces

Let us now turn our attention to examples of the application of organic fiducial inference in which it was known, before the data were observed, that values in a given subset of the natural space of the model parameters were impossible, but apart from this, nothing or very little was known about these parameters. In relation to this issue, the importance of the need to make inferences about a normal mean when there is a lower bound on , and about a Poisson rate parameter when there is a positive lower bound on has been underlined by practical examples from the field of quantum physics that are described, for example, in Mandelkern (2002). These examples motivate what will be examined in the present section.

7.1 Inference about a bounded normal mean with unknown variance

With regard to the example considered in Section 5, let us change what is assumed to have been known about the mean before the data were observed to the assumption that, for any given value of the variance , it was known that , where is a given finite constant, but apart from this, nothing or very little was known about . In this situation, it is quite natural to specify the GPD function for as follows:

where . Although, as was the case in Section 5, this GPD function is neutral, this time the condition in equation (4) will never hold, and therefore the fiducial density is derived under Principle 1 by using the moderate rather than the strong fiducial argument. The consequence of this in terms of the definition of the marginal fiducial density for is that this density function becomes simply the marginal density function for defined in equation (9) conditioned to lie in the interval . However, it is of interest to examine the potential effect on the relative external strength of this marginal density function due to the use of the moderate rather than the strong fiducial argument in constructing the conditional density .

In this regard, let us remember that in the definition of the function in equation (8) it was assumed that the pre-data density function of the primary r.v. , i.e. the function , is a standard normal density function. Now, on observing the sample mean , we immediately know that the value generated in step 2 of the algorithm in Assumption 1 must be less than the value .

The moderate fiducial argument in this situation, i.e. the argument that the relative height of the post-data density function of , i.e. the function , in the interval should be equal to the relative height of over this interval, is similar (but not identical) to the Bayesian argument that the relative height of a density function for a fixed parameter should not be affected by learning that a given subset of values for are impossible, apart from it of course becoming equal to zero over this subset. Although this type of Bayesian argument has been criticized as being overly simplified due to the fact that it does not take into account the manner in which we learn that values in the particular subset are impossible, see for example Shafer (1985), it is an argument that is considered as being almost universally acceptable. For this reason, under the same assumptions about the reference set and the resolution as made in previous sections, it can be argued that the density function , i.e. a standard normal density for truncated to the interval , in the context of being a representation of what is believed about after the data are observed, should be regarded as being externally very strong. As a result, under the same assumptions, the case can be made that the joint fiducial density of and in the present example, and the marginal densities that can be derived from this joint density should also be regarded as being externally very strong.

Clearly the same type of reasoning can be applied to many other problems of inference over restricted parameter spaces that are similar to the problem that has just been discussed.

7.2 Inference about a bounded Poisson rate parameter

Returning to the problem of making inferences about a Poisson rate parameter that was discussed in Section 6.2, let us now assume that before the data were observed, it was known that , where is a given positive constant, but apart from this, nothing or very little was known about . Again, as was the case in Section 6.2, it is clear that Principle 1 can not be applied to determine the fiducial density of .

Observe that, in this new situation, the set as defined in Condition 1, where the parameter in this definition is , is the set , and that it is natural to specify the GPD function so that Condition 2(b) is satisfied. However, in contrast to the example outlined in Section 6.2, the definition of the function given in equation (15) implies that the set as defined in Condition 1 does not satisfy Condition 2(a), and therefore we have the problem of ‘spillage’ that was referred to at the end of Section 3.4.

The first step of a very straightforward way of trying to circumvent this difficulty is to make inferences about in an artificial scenario, namely the scenario considered in Section 6.2. In doing this, it will be assumed that the LPD function is chosen to represent as best as possible a general situation where nothing or very little was known about the parameter over the interval before the data were observed, e.g. the LPD function given in equation (16) or equation (17). Having determined a fiducial density for over the interval by using this method, we then simply condition this density to lie in the interval to thereby obtain a fiducial density for that corresponds to the problem at hand.

Although in applying this strategy we do not directly use any of the three types of fiducial argument outlined in Section 3.2, if the same strategy was applied to the example discussed in Section 7.1, which of course would not require the use of a LPD function, then the fiducial density of conditional on being known, i.e. the density , would be the same as is obtained by using the approach put forward in this previous section, i.e. an approach that is based on the moderate fiducial argument. On the other hand, the strategy has the clear disadvantage that it depends on expressing pre-data knowledge about a parameter of interest via the GPD function, and possibly also via the LPD function, with regard to an artificial scenario rather than the scenario that is actually under consideration. Nevertheless, under the same assumptions about the reference set and the resolution as made in previous sections, it still can be argued that, if in the present example, the observed count is greater than zero and is not very small relative to the threshold , then the fiducial density for that results from using this strategy should be regarded as being externally quite strong.

To give a good practical example of the application of the strategy that has just been discussed, let us suppose that the threshold , which will be regarded as the event rate for the background noise over a time length , needs to be estimated on the basis of a Poisson count collected over a period of length times when only background noise could be present, where is a given value. Since it will be assumed that can take any positive value, the fiducial density of formed on the basis of the data , i.e. the density , is defined in the same way as the fiducial density was defined in Section 6.2. Taking into account also a Poisson count collected over a period of length when a signal should be present, we will then be interested in making inferences about the event rate over this time period, which will be regarded as the event rate for background noise plus the signal. Due to the fact that will be assumed to be a positive event rate, namely the event rate for the signal only, the parameter must be greater than , and so it will be assumed that the fiducial density of formed on the basis of the data and conditioned on being greater than , i.e. the density , is determined using the method described in the present section. Given these definitions, the joint fiducial density of and can therefore be expressed as

To illustrate a specific case, Figures 4(a) and 4(b) show histograms of 500,000 independent random values drawn from, respectively, the marginal density of and the marginal density of over this joint fiducial density assuming that the LPD function that was used to form both of the densities and was the simple step function given in equation (16) and that , and . The solid-line and dashed-line curves in Figure 4(a) represent the posterior density of that corresponds, respectively, to the use of the Jeffreys prior for the case when is unrestricted over the interval and to the use of this prior density with the condition that , where 0.75 () is clearly the maximum likelihood estimate of . These curves have been added to this figure, only because we know that, under the conditions in question, they closely approximate the fiducial densities for when the LPD function being considered is used. In particular, comparing the lower tails of the histogram and the dashed-line curve in Figure 4(a), highlights the extra uncertainty that is introduced by taking into account the statistical error in the estimation of .

Figure 4: Samples from marginal organic fiducial densities of Poisson event rates

8 An example with two different GPD functions that are non-neutral

To give a final example of the application of organic fiducial inference, let us again return to the problem of inference considered in Section 5, and let us assume that the GPD function used to determine the fiducial density of the mean given the variance , i.e. the density , is one of the two step functions defined by

(19)

and by

(20)

where is any given constant greater than one, and is any given positive constant. As a way of interpreting either of these two GPD functions, it can be observed that if there is an interval of values for the primary r.v.  such that for all , where in keeping with earlier notation is the value of that maps on to the value given the data , and there is another interval for such that for all , then the probability of the event divided by the probability of the event will be regarded as being times larger after the data are observed than before step 2 of the algorithm in Assumption 1 was implemented.

Clearly the GPD function in equation (19) can be used to represent the scenario in which nothing or very little was known about before the data were observed, except that it was known that, when the data are observed, positive values of would be regarded as being more likely and negative values of less likely than as required to be able to accept the strong fiducial argument. On the other hand, if for example is chosen to be small, the GPD function in equation (20) could be used to represent a scenario where there was little or no pre-data knowledge about except that, it was known that, when the data are observed, values of lying in a narrow interval centred at zero, which could be the value of that corresponds to the null effect of a treatment compared to a control, would be regarded as being more likely and values of lying outside of this interval less likely than as assumed by the strong fiducial argument.

On the basis of either of the GPD functions in equations (19) and (20), the fiducial density is derived under Principle 1 by applying the weak fiducial argument. In particular, the two forms of this fiducial density that correspond to using these two GPD functions are the same as the two forms of the posterior density for given that result from treating these GPD functions as prior densities for under the Bayesian paradigm. However, there are at least two good reasons why it is better to regard these densities as being fiducial densities backed by the methodology outlined in Section 3.4, rather than posterior densities backed up by standard Bayesian theory.

First, if the GPD functions in equations (19) and (20) are treated as being prior densities then these density functions must be improper. This is also one of a number of criticisms that could be applied to the interpretation of the fiducial density derived in Section 5 as being a posterior density for , as the required prior density for in this case would be a flat improper density for over the interval . More specifically, though, it would seem particularly awkward to try to justify either of the improper prior densities for

that correspond to the GPD functions being presently considered as being a natural approximation to a proper prior density, or some kind of natural limit of allowing a hyperparameter of a proper prior density to tend to infinity. This is due to the discontinuity that occurs at zero for the function in equation (

19), and the discontinuities that occur at and for the function in equation (20).

The second reason why it is better to use fiducial rather than Bayesian reasoning in the cases under consideration is that the fiducial densities that correspond to the GPD functions in equations (19) and (20) can be regarded as being based on a set of conditional versions of these densities derived using the moderate fiducial argument. In particular, under the GPD function in equation (19), the fiducial density for when is conditioned to lie in one of the intervals or would be derived using the moderate fiducial argument, while, under the GPD function in equation (20), the fiducial density for when is conditioned to lie in one of the subsets or would also be derived using this type of fiducial argument. Taking into account the intuitive appeal of the moderate fiducial argument that was discussed in Section 7.1, the case can be made that the partial dependence on this argument that has been identified should mean that, under the same assumptions about the reference set and the resolution as made in previous sections, the relative external strength of the fiducial density , when is unrestricted over the whole of the real line, and when either of the GPD functions in question is used, should be regarded as being reasonably high in many situations where the use of the GPD function concerned is considered to be adequate.

The same line of reasoning can also be applied in assessing the relative external strength of the fiducial density of any given parameter of any given sampling model conditional on all other parameters, provided that such a density for can be derived under Principle 1, and the GPD function for is a step function with at least two steps that have distinct non-zero heights. Furthermore, if the GPD function for is allowed to take any form that simply satisfies the loose requirements of Definition 3, then despite this line of reasoning being in general no longer applicable, the capacity to express pre-data knowledge about in a way that is distinct from placing a prior density over under the Bayesian paradigm will be generally retained.

On the other hand, if Condition 1 is not satisfied then, since Conditions 2(a) and 2(b) can only be satisfied by special, albeit quite important, forms of the GPD function for , e.g. the simple choices made for this function in the cases considered in Section 6, it is clear that over all possible choices for this function, we will not be able in general to make inferences about by directly using the methodology outlined in Section 3.4. However, in this general case, we can use a similar strategy to the one outlined in Section 7.2 by first using Principle 2 to construct a fiducial density that would be appropriate in the artificial scenario in which it is assumed that there was little or no pre-data knowledge about , and then normalizing the density function that results from multiplying this preliminary fiducial density for by the GPD function for that corresponds to the actual scenario being considered. For a similar reason to that which has just been outlined combined with reasoning given in Section 7.2, this type of strategy would appear to be particularly attractive if this latter GDP function for is a step function, although it generally offers a useful alternative way of taking into account pre-data knowledge about over all choices for this function.

9 Closing comment

Since the theory of organic fiducial inference is a generalization of the theory of subjective fiducial inference, issues that were identified in the final section of Bowater and Guzmán (2018a) as being relevant to the further development of this latter theory, i.e. the coherence of inferences based on subsets of the data set of interest, alternative definitions of the fiducial statistic and computational issues, also apply to the theory that has been put forward in the present paper. To save space the reader is referred to this earlier paper for a discussion of these issues.

References

Berger, J. O., Bernardo, J. M. and Sun, D. (2015). Overall objective priors. Bayesian Analysis, 10, 189–221.

Bowater, R. J. (2017a). A formulation of the concept of probability based on the use of experimental devices. Communications in Statistics: Theory and Methods, 46, 4774–4790.

Bowater, R. J. (2017b). A defence of subjective fiducial inference. AStA Advances in Statistical Analysis, 101, 177–197.

Bowater, R. J. and Guzmán, L. E. (2018a). Multivariate subjective fiducial inference. arXiv.org (Cornell University Library), Statistics Theory, arXiv:1804.09804.

Bowater, R. J. and Guzmán, L. E. (2018b). On a generalized form of subjective probability. arXiv.org (Cornell University Library), Statistics (stat.OT), arXiv:1810.10972.

Brooks, S. P. and Roberts, G. O. (1998). Convergence assessment techniques for Markov chain Monte Carlo. Statistics and Computing, 8, 319–335.

Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409.

Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472.

Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.

Mandelkern, M. (2002). Setting confidence intervals for bounded parameters (with discussion). Statistical Science, 17, 149–172.

Shafer, G. (1985). Conditional probability (with discussion). International Statistical Review, 53, 261–277.

Tuyl, F. (2017). A note on priors for the multinomial model. The American Statistician, 71, 298–301.