Let be an estimator of based on sample of size from a population with as a parameter. If another observation is added to the data, a simple way of incorporating it in the existing estimator is by what we call the jackknife extension,
Plainly, and if is symmetric in its arguments (as is usually the case) the jackknife extension is symmetric in
If , then not only but a stronger inequality holds:
The inequality (2) is a direct corollary of a special case of the so called variance drop lemma due to (Artstein et al.,2004).
Let be independent identically distributed random variables and
be independent identically distributed random variables anda function with . Set
Note that with instead of on the right hand side of (5), the inequality becomes a trivial corollary of
holding for any numbers .
For an extension of the variance drop lemma see (Madiman an Barron, 2007).
Suppose that starting with and , the statistician constructs the jackknife extension of , then the jackknife extension of and so on. One can easily see that for the estimator thus obtained is a classical -statistic with the kernel :
Hoeffding initiated studying -statistics back in 1948. The variance of can be explicitly expressed in terms of . Set
The following formula due to Hoeffding (1948) expresses via :
2 Main result
Assume that the distributions are given by differentiable in density (with respect to a measure ) with the Fisher information
well defined and finite.
If , by Cramér-Rao inequality
In particular, if
is an unbiased estimator of, .
Furthermore, if , then for all and the following lemma holds.
(Hoeffding 1948). As , is asymptotically normal .
Due to Cramér-Rao inequality, for any unbiased estimator of based on a sample from a population with Fisher information ,
Combining (6) with Lemma 2 leads to a formula for the asymptotic efficiency of :
Let be a random element, with finite Fisher information . If is a (scalar valued) function with differebtiable and , then
Proof. Take the projection of the Fisher score into the subspace span of the Hilbert space of functions with with :
Multiplying both sides by and taking the expectations results in due to the property of the Fisher score. Hence
which is exactly (8). The equality sign in (8) is attained if and only if with
-probability one the relation
holds for .
From and (8) one gets
Thus, a necessary condition for the asymptotic efficiency of is and by virtue of (4)
for some with .
From Lemma 3 the estimator (12) is an asymptotically efficient estimator of if and only if the relation (10) holds implying that the family is exponential,
where the functions in the exponent are such that .
From (8) one can see that that the maximum likelihood equation for based on a sample from population (13) is
as the maximum likelihood estimator of is asymptotically efficient.
We summarize the above as a theorem.
Under the regularity type conditions of the theory of maximum likelihood estimators, the jackknife extension estimators are asymptotically efficient if and only if they are arithmetic means based on samples from exponential families.
3 Some examples
The jackknife extension lacks innovation. A jackknife extension estimator based on differs from the estimator based on only by the sample sample size. In a sense, it is an extensive vs. intensive use of the data when the main factor is quantity vs. quality.
Nonparametric estimators of population characteristics such as the empirical distribution function, the sample mean and variance are jackknife extensions. Their main goal is to be universal rather than optimal for individual populations. An interesting statistic is the sample median constructed from a sample from a continuous population. Without loss in generality, one may assume
For . If or , one can easily see that
where and are the st and nd elements of the sample
. Thus, the median of a sample of an even size is a jackknife extension though one should keep in mind that the definitions of the median in samples of even and odd size are different and it is not clear if the inequality (2) holds.
For the jackknife extension of is not . Let us start with simple cases of and . In the first case,
is a weighted average of and its nearest neighbors. The same holds in the second case, with instead of and different weights:
It seems likely that the extrapolation to an arbitrary will result in
Though (16) is a reasonable estimator of the median, it is not clear how it behaves for in small and large samples compared to the standard .
Artstein, S., Ball, K. M., Barthe, F., Naor, A. (2004).
Solution of Shannon s problem on the monotonicity of entropy.
J. Amer. Math. Soc., 17, 975–982.
Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution.Ann. Math. Stat., 19, 293–325.
Kagan, A. M., Yu, T., Barron, A., Madiman, M. (2011). Contribution to the theory of Pitman estimators. J. Math. Sci.,199, 2, 202-214.
Madiman, M., Barron, A. (2007). Generalized Entropy Power Inequalities and Monotonicity Properties of Information. IEEE Transactions on Information Theory, 53, 2317–2329.