Efficiency requires innovation

02/18/2019 ∙ by Abram M. Kagan, et al. ∙ 0

In estimation a parameter θ∈ R from a sample (x_1,...,x_n) from a population P_θ a simple way of incorporating a new observation x_n+1 into an estimator θ̃_n = θ̃_n(x_1,...,x_n) is transforming θ̃_n to what we call the jackknife extension θ̃_n+1^(e) = θ̃_n+1^(e)(x_1,...,x_n,x_n+1), θ̃_n+1^(e) = {θ̃_n (x_1 ,...,x_n)+ θ̃_n (x_n+1,x_2 ,...,x_n) + ... + θ̃_n (x_1 ,...,x_n-1,x_n+1)}/(n+1). Though θ̃_n+1^(e) lacks an innovation the statistician could expect from a larger data set, it is still better than θ̃_n, var(θ̃_n+1^(e))≤n/n+1 var(θ̃_n). However, an estimator obtained by jackknife extension for all n is asymptotically efficient only for samples from exponential families. For a general P_θ, asymptotically efficient estimators require innovation when a new observation is added to the data. Some examples illustrate the concept.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let be an estimator of based on sample of size from a population with as a parameter. If another observation is added to the data, a simple way of incorporating it in the existing estimator is by what we call the jackknife extension,



Plainly, and if is symmetric in its arguments (as is usually the case) the jackknife extension is symmetric in .

If , then not only but a stronger inequality holds:


The inequality (2) is a direct corollary of a special case of the so called variance drop lemma due to (Artstein et al.,2004).

Lemma 1


be independent identically distributed random variables and

a function with . Set



Note that with instead of on the right hand side of (5), the inequality becomes a trivial corollary of

holding for any numbers .
For an extension of the variance drop lemma see (Madiman an Barron, 2007).
Suppose that starting with and , the statistician constructs the jackknife extension of , then the jackknife extension of and so on. One can easily see that for the estimator thus obtained is a classical -statistic with the kernel :


Hoeffding initiated studying -statistics back in 1948. The variance of can be explicitly expressed in terms of . Set

The following formula due to Hoeffding (1948) expresses via :


2 Main result

Assume that the distributions are given by differentiable in density (with respect to a measure ) with the Fisher information

well defined and finite.
If , by Cramér-Rao inequality

In particular, if

is an unbiased estimator of

, .
Furthermore, if , then for all and the following lemma holds.

Lemma 2

(Hoeffding 1948). As , is asymptotically normal .

Due to Cramér-Rao inequality, for any unbiased estimator of based on a sample from a population with Fisher information ,


Combining (6) with Lemma 2 leads to a formula for the asymptotic efficiency of :

Lemma 3

Let be a random element, with finite Fisher information . If is a (scalar valued) function with differebtiable and , then


Proof. Take the projection of the Fisher score into the subspace span of the Hilbert space of functions with with :


Multiplying both sides by and taking the expectations results in due to the property of the Fisher score. Hence

which is exactly (8). The equality sign in (8) is attained if and only if with

-probability one the relation


holds for .
From and (8) one gets


Thus, a necessary condition for the asymptotic efficiency of is and by virtue of (4)


for some with .
From Lemma 3 the estimator (12) is an asymptotically efficient estimator of if and only if the relation (10) holds implying that the family is exponential,


where the functions in the exponent are such that .
From (8) one can see that that the maximum likelihood equation for based on a sample from population (13) is



as the maximum likelihood estimator of is asymptotically efficient.
We summarize the above as a theorem.

Theorem 1

Under the regularity type conditions of the theory of maximum likelihood estimators, the jackknife extension estimators are asymptotically efficient if and only if they are arithmetic means based on samples from exponential families.

3 Some examples

The jackknife extension lacks innovation. A jackknife extension estimator based on differs from the estimator based on only by the sample sample size. In a sense, it is an extensive vs. intensive use of the data when the main factor is quantity vs. quality.

Nonparametric estimators of population characteristics such as the empirical distribution function, the sample mean and variance are jackknife extensions. Their main goal is to be universal rather than optimal for individual populations. An interesting statistic is the sample median constructed from a sample from a continuous population. Without loss in generality, one may assume

For . If or , one can easily see that


where and are the st and nd elements of the sample

. Thus, the median of a sample of an even size is a jackknife extension though one should keep in mind that the definitions of the median in samples of even and odd size are different and it is not clear if the inequality (2) holds.

For the jackknife extension of is not . Let us start with simple cases of and . In the first case,

is a weighted average of and its nearest neighbors. The same holds in the second case, with instead of and different weights:

It seems likely that the extrapolation to an arbitrary will result in


Though (16) is a reasonable estimator of the median, it is not clear how it behaves for in small and large samples compared to the standard .

4 References

Artstein, S., Ball, K. M., Barthe, F., Naor, A. (2004). Solution of Shannon s problem on the monotonicity of entropy. J. Amer. Math. Soc., 17, 975–982.

Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution.

Ann. Math. Stat., 19, 293–325.

Kagan, A. M., Yu, T., Barron, A., Madiman, M. (2011). Contribution to the theory of Pitman estimators. J. Math. Sci.,199, 2, 202-214.

Madiman, M., Barron, A. (2007). Generalized Entropy Power Inequalities and Monotonicity Properties of Information. IEEE Transactions on Information Theory, 53, 2317–2329.