 # Fiducial Symmetry in Action

Symmetry is key in classical and modern physics. A striking example is conservation of energy as a consequence of time-shift invariance from Noether's theorem. Symmetry is likewise a key element in statistics, which, as also physics, provide models for real world phenomena. Sufficiency, conditionality, and invariance are examples of basic principles. Galili and Meilijson (2016) and Mandel (2020) illustrate the first two principles very nicely by considering the scaled uniform model. We illustrate the third principle by providing further results which give optimal inference for the scaled uniform by symmetry considerations. The proofs are simplified by relying on fiducial arguments as initiated by Fisher (1930). Keywords: Data generating equation; Optimal equivariant estimate; Scale family; Conditionality principle; Minimal sufficient; Uniform distribution;

Comments

There are no comments yet.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let

 y=θu (1)

If is a random sample from the uniform law with known design parameter , then is a random sample from the scaled uniform distribution with scale parameter . Estimation of in this normalized case was investigated in some detail by Galili and Meilijson (2016) and Mandel (2020). The remainder of this section recaps some of their results. Additional results and discussion are found in many classical texts in theoretical statistics since the scaled uniform, and its relatives, are used as a prototypical counterexamples to results depending on a smooth likelihood: Fisher Information, Crámer-Rao bound, efficiency of MLEs,

The assumed data generating equation (1) gives the likelihood

 L(θ)=∏i[θ(1−k)≤yi≤θ(1+k)]2kθ=(^θ\tiny ML≤θ≤^θ\tiny MU% )(2kθ)n (2)

where and . The likelihood is hence zero for and for , and has jumps at and . The estimates and give deterministic information since is always true. Formally, is a confidence interval. A minimal sufficient statistic is given by the smallest and largest observation , by the sure interval , or equivalently by .

The maximum likelihood estimator (MLE) from equation (2) is, as the notation suggests, equal to the lower bound . It can be observed that if the closed interval is replaced by the open interval , then this would give an example where the MLE does not exist. The MLE is, in fact, inefficient and biased as proved by Galili and Meilijson (2016). It is, furthermore, unreasonable since it totally ignores the information provided by the smallest observation .

An alternative unbiased estimator is the Rao-Blackwellization of

:

 ^θ\tiny RB=Eθ(Y1\operatornamewithlimits∣S=s)=y(1)+y(n)2 (3)

It is claimed, wrongly, by Galili and Meilijson (2016, p.109) that . The claim is wrong since the conditional law of

has point masses at both endpoints of the interval. This exemplifies that distributions that are neither continuous nor discrete appears naturally in probability calculations. Equation (

3) is, nevertheless, correct.

The estimator is optimal in a related location model, but not so in the given scale model. Galili and Meilijson (2016) prove that the Rao-Blackwell estimator is, in fact, uniformly improved by with . This ensures that

is unbiased, and has minimum variance in the class of linear functions of

and . The Rao-Blackwell estimator has, however, an advantage compared to the uniform improvement : It gives always a feasible estimate in the sense of belonging to the sure interval . The Rao-Blackwell estimator is optimal in the class of feasible linear unbiased estimators.

In the quest for better estimators Galili and Meilijson (2016) turn to a Bayesian approach with an improper prior . This is a natural choice since combined with the likelihood in equation (2) the resulting posterior is a truncated distribution with density:

 π(θ\operatornamewithlimits∣y)=αa−α−b−α⋅(a≤θ≤b)⋅θ−α−1 (4)

The truncation parameters is and the index is . Let and assume . A family of Bayes estimators is then

 ^θ\tiny Bp=E(Θ\operatornamewithlimits∣Y=y)=αα−1⋅1−b∗1−α1−b∗−α⋅^θ\tiny ML (5)

Galili and Meilijson (2016) prove that the estimator is in fact unbiased. Furthermore, numerical evidence indicates that it is uniformly better than . It is quite remarkable that the Bayes estimator from the prior is both unbiased and so good.

In enters Mandel (2020) and gives an alternative justification for the estimator . He bases the inference on the minimal sufficient . The key observation is that is ancillary: The law of does not depend on . It is tempting, but erroneous, to conclude from this that all information regarding is included in . As explained earlier, , or equivalently , does not contain all information regarding .

The conditionality principle dictates, as advocated by Mandel (2020), that inference should be based on the conditional model given the ancillary . Calculus shows that the conditional law of given is with truncation interval and index (Mandel, 2020, eq.1). The conditional model has as a scale parameter. A calculation using the expectation of the gives . This gives an unbiased estimator for given . It follows then that is also unconditionally unbiased since . Remarkably, calculus shows that .

The arguments of Mandel (2020) give hence a simplified proof of unbiasedness of , and identifies it with a natural estimator arising from a conditional argument. A natural question next is: Is optimal? This is answered in the negative by Mandel (2020). He shows that fulfills criteria that ensures existence of an unbiased estimator with smaller variance at any fixed . The next sections provide alternative estimators that are optimal.

## 2 Fiducial Inference

Fisher (1930, p.532,p.534) introduced fiducial inference for the correlation coefficient based on the empirical correlation

of a random sample from a two-dimensional Gaussian distribution. The argument is based on inversion of the pivotal

where . A draw from the fiducial distribution of the correlation coefficient given is obtained by a draw , and then returning the unique solution of the equation .

Unfortunately, Fisher gave many other recipes for obtaining a fiducial which where conflicting, and the birth of fiducial inference was a thorny one (Schweder and Hjort, 2016, p.185-). A particular blind alley is given by Fisher’s many attempts at using the likelihood instead of the cumulative for the construction of a fiducial. Note that a particular likelihood can result from many different statistical models and that a particular statistical model can result from many different data generating equations. Cui and Hannig (2019) give further references to the literature on fiducial inference, and demonstrate that the initial approach of Fisher (1930) was the correct one even in a non-parametric setting.

Modern fiducial inference is based on a data generating equation instead of a family of distributions as a model for the observed data. A given family of distributions can be the result of many different data generating equations. A particular data generating equation contains hence information beyond the resulting family of distributions. The data generating equation represents prior information, but not in the form of a prior distribution as in Bayesian inference. The fiducial argument was invented by Fisher to obtain a distribution for the unknown parameter from the observations and a model for cases where a Bayesian prior is absent. It will next be demonstrated how this can be done for the scaled uniform. Furthermore, it will be proved in Section

3 by equation (12) that this fiducial gives an estimate with uniformly minimum expected squared error.

A fiducial model is specified by taking equation (1), , as a data generating equation for the scaled uniform. The direct recipe from the previous is then to draw and solve for the parameter . Unfortunately, this fails since there are equations for the one unknown . Intuitively, the solution is given by observing that must be conditioned to be a point on the ray . Mathematically, this is not sufficient, but it is by instead conditioning on a function of with this line as a level set. The set of different possible rays gives a partition of the set of possible ’s. Let have exactly the partition as its level sets. The required conditioning is then uniquely determined by conditioning on where . The recipe is then to draw from its initial distribution conditionally on , and then return as a unique draw from the fiducial.

The previous argument determines the fiducial uniquely from the data generating equation (1), and in fact for any scale model. The analysis is simplified for the scaled uniform by replacing equation (1) by the sufficient data generating equation

 (y(1),y(n))=θ(u(1),u(n)) (6)

The fiducial is where . This is as in the previous recipe since the mapping has exactly the rays as its level sets. The calculations of Mandel (2020, eq.1) gives now that with truncation interval and index . The resulting fiducial is

 Θ∼Pareto(n,[^θ\tiny ML,^θ\tiny MU]) (7)

It has here been used that implies
.

In a certain sense the analysis is now finished. The fiducial in equation (7) gives our state of knowledge regarding the unknown model based on the fiducial model in equation (1), or equivalently equation (6), and the observation . This is, as Fisher intended, an alternative to a Bayesian posterior distribution.

## 3 Optimal Decisions

The analysis can be continued in many ways depending on the question of interest. The original question here was: How should the parameter

be estimated? This question is now simpler than initially because we have a probability distribution for

given in equation (7). What guess for should we choose when its probability distribution is known?

Decision theory gives a possible route. The guess, or estimate, is a decision. One possibility is to minimize the risk given by

 r=−Eδ(Θ−^θ) (8)

where is the Dirac delta function. The density of has a maximum at , so the result is then the maximum likelihood estimator. In a Baysian analysis this corresponds to choosing the maximum a posterior (MAP) estimate.

Another possibility is to minimize the risk given by

 r=E(Θ−^θ)2 (9)

corresponding to expected squared error loss . The resulting estimate is then as given in equation (5) with . This is so because the fiducial coincides with Bayesian posterior for the case . It follows similarly that the estimator minimizes the risk

Except for none of the above estimators are unbiased. The property of being unbiased is a natural demand for estimating a location parameter, but not so for a scale parameter. A natural demand is to require scale invariance. This translates into demanding that is a pivotal quantity. An estimator is said to be scale equivariant if this demand is fulfilled. As also noted by Galili and Meilijson (2016, p.110) this requirement holds for all suggested estimators here. The natural question is then: Is there an optimal scale equivariant estimator? As explained by Berger (1985, p.388-) it is natural to only consider risk corresponding to invariant loss.

A possible invariant risk is given by

 r=E(1−Θ−1^θ)2 (10)

Differentiation and solving gives the optimal invariant estimator

 ^θ\tiny B3=E(Θ−1)E(Θ−2) (11)

As the notation suggests this equals the Bayes posterior estimator given in equation (5) with . This can be seen by observing , and using the simple form of the fiducial distribution in this particular case.

The observation also shows that has uniformly minimal squared error loss in the class of scale equivariant estimators. The claim follows from the frequentist risk

 rθ=Eθ(θ−ψ(Y))2=θ2E(1−θ−1ψ(θU))2=θ2E[E% {(1−Θ−1ψ(ΘU))2\operatornamewithlimits∣S2}]=θ2E[%E{(1−Θ−1ψ(y))2\operatornamewithlimits∣S2}] (12)

Taraldsen and Lindqvist (2013) prove similar frequentist optimality more generally for equivariant decision rules derived from an invariant loss and a fiducial distribution.

Another possible invariant risk is given by

 r=E(lnΘ−ln^θ)2 (13)

The corresponding optimal equivariant estimator is determined by . Equation (7) gives with and calculus gives

 ^θ\tiny SC=^θ\tiny MLexp{1n−lnb∗1−b∗n} (14)

## 4 Conclusion

It has been proven that the estimators and given in equation (5) and equation (14) respectively are optimal equivariant estimators in the sense of uniformly minimizing the frequentist risks and respectively. These results follows from invariance and complements the results for the scaled uniform as previously discussed by Galili and Meilijson (2016) and Mandel (2020). Sufficiency, conditionality, and invariance are central themes in theoretical statistics and in the arguments given here.

Existence of a uniformly minimum-variance unbiased (UMVU) estimator is left open. It is known, as demonstrated by Rao (1973, p.379, Exercise 11), that in the case where the minimal sufficient statistic is not complete some parameters may have UMVU estimators, but others may not. It would be interesting, based on curiosity, to know if a UMVU estimator for exists, but equivariance is a more natural demand for the case of a scale parameter.

In retrospect, it was perhaps fortunate that Fisher invented many and conflicting roads in his attempts to arrive at fiducial inference. Blind alleys are blind alleys, but often there are rewards along the way. This has certainly been the case also for the many different versions of fiducial inference that has been suggested. Fisher (1930, p.532,p.534) invented both confidence intervals and confidence distributions in his initial attempt. The theory of the latter topic is still in its infancy, but considerable progress have been made lately as documented by Schweder and Hjort (2016).

The fiducial distribution of is and it is a confidence distribution. This follows from the general arguments given by Taraldsen and Lindqvist (2013). It can be concluded that fiducial inference as initiated by Fisher (1930) can be used to obtain both optimal estimators and exact confidence distributions. This has here been demonstrated for the scaled uniform distribution which is otherwise used as a counterexample for obtaining good theoretical results.

## References

• Berger (1985) Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer (second edition).
• Cui and Hannig (2019) Cui, Y. and J. Hannig (2019). Nonparametric generalized fiducial inference for survival functions under censoring. Biometrika 106(3), 501–518.
• Fisher (1930) Fisher, R. A. (1930). Inverse probability. Proc. Camb. Phil. Soc. 26, 528–535.
• Galili and Meilijson (2016) Galili, T. and I. Meilijson (2016). An Example of an Improvable Rao–Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator. The American Statistician 70(1), 108–113.
• Mandel (2020) Mandel, M. (2020). The Scaled Uniform Model Revisited. The American Statistician 74(1), 98–100.
• Rao (1973) Rao, C. R. (1973). Linear Statistical Inference and Its Applications (Second ed.). Wiley.
• Schweder and Hjort (2016) Schweder, T. and N. L. Hjort (2016). Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions. Cambridge University Press.
• Taraldsen and Lindqvist (2013) Taraldsen, G. and B. H. Lindqvist (2013). Fiducial theory and optimal inference. Annals of Statistics 41(1), 323–341.