# Efficiency requires innovation

In estimation a parameter θ∈ R from a sample (x_1,...,x_n) from a population P_θ a simple way of incorporating a new observation x_n+1 into an estimator θ̃_n = θ̃_n(x_1,...,x_n) is transforming θ̃_n to what we call the jackknife extension θ̃_n+1^(e) = θ̃_n+1^(e)(x_1,...,x_n,x_n+1), θ̃_n+1^(e) = {θ̃_n (x_1 ,...,x_n)+ θ̃_n (x_n+1,x_2 ,...,x_n) + ... + θ̃_n (x_1 ,...,x_n-1,x_n+1)}/(n+1). Though θ̃_n+1^(e) lacks an innovation the statistician could expect from a larger data set, it is still better than θ̃_n, var(θ̃_n+1^(e))≤n/n+1 var(θ̃_n). However, an estimator obtained by jackknife extension for all n is asymptotically efficient only for samples from exponential families. For a general P_θ, asymptotically efficient estimators require innovation when a new observation is added to the data. Some examples illustrate the concept.

There are no comments yet.

## Authors

• 5 publications
• ### Estimation of smooth functionals in high-dimensional models: bootstrap chains and Gaussian approximation

Let X^(n) be an observation sampled from a distribution P_θ^(n) with an ...
11/07/2020 ∙ by Vladimir Koltchinskii, et al. ∙ 0

• ### Logarithm of ratios of two order statistics and regularly varying tails

Here we suppose that the observed random variable has cumulative distrib...
04/16/2019 ∙ by Pavlina K. Jordanova, et al. ∙ 0

• ### Covariance-Aware Private Mean Estimation Without Private Covariance Estimation

We present two sample-efficient differentially private mean estimators f...
06/24/2021 ∙ by Gavin Brown, et al. ∙ 0

• ### Relative Efficiency of Higher Normed Estimators Over the Least Squares Estimator

In this article, we study the performance of the estimator that minimize...
03/19/2019 ∙ by Gopal K Basak, et al. ∙ 0

• ### Modeling, Visualization, and Analysis of African Innovation Performance

In this paper we discuss the concepts and emergence of Innovation Perfor...
08/18/2020 ∙ by Muhammad Omer, et al. ∙ 0

• ### AMORE-UPF at SemEval-2018 Task 4: BiLSTM with Entity Library

This paper describes our winning contribution to SemEval 2018 Task 4: Ch...
05/14/2018 ∙ by Laura Aina, et al. ∙ 0

• ### Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

Anomaly estimation, or the problem of finding a subset of a dataset that...
07/15/2020 ∙ by Uthsav Chitra, et al. ∙ 25

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let be an estimator of based on sample of size from a population with as a parameter. If another observation is added to the data, a simple way of incorporating it in the existing estimator is by what we call the jackknife extension,

 ~θ(e)n+1=~θ(e)n+1(x1,…,xn,xn+1)=(~θn,1+…+~θn,n+1)/(n+1) (1)

where

 ~θn,i=~θn(x1,…,xi−1,xi+1,…,xn),i=2,…,n,~θn,n+1=~θn(x1,…,xn,xn).

Plainly, and if is symmetric in its arguments (as is usually the case) the jackknife extension is symmetric in .

If , then not only but a stronger inequality holds:

 (n+1)varθ(~θ(e)n+1)≤nvarθ(~θn). (2)

The inequality (2) is a direct corollary of a special case of the so called variance drop lemma due to (Artstein et al.,2004).

###### Lemma 1

Let

be independent identically distributed random variables and

a function with . Set

 ψ1=ψ(X2,…,Xn+1),ψi=ψ(X1,…,Xi−1,Xi+1,…,Xn+1),i=2,…,n+1.

Then

 var(n+1∑1ψi)≤nn+1∑1var(ψi). (3)

Note that with instead of on the right hand side of (5), the inequality becomes a trivial corollary of

 (n+1∑1ai)2≤(n+1)n+1∑1a2i

holding for any numbers .
For an extension of the variance drop lemma see (Madiman an Barron, 2007).
Suppose that starting with and , the statistician constructs the jackknife extension of , then the jackknife extension of and so on. One can easily see that for the estimator thus obtained is a classical -statistic with the kernel :

 ~θ(e)n(x1,…,xn)=1(nm)∑1≤i1≤…≤im≤n~θm(xi1,…,xim). (4)

Hoeffding initiated studying -statistics back in 1948. The variance of can be explicitly expressed in terms of . Set

 ~θm|k(x1,…,xk)=Eθ{~θm(X1,…,Xm)|X1=x1,…,Xk=xk}.

The following formula due to Hoeffding (1948) expresses via :

 varθ(~θ(e)n)=1(nm)m∑k=1(mk)(n−mm−k)vk(θ). (5)

## 2 Main result

Assume that the distributions are given by differentiable in density (with respect to a measure ) with the Fisher information

 I(θ)=∫(∂logp(x;θ)∂θ)2p(x;θ)dμ(x)

well defined and finite.
If , by Cramér-Rao inequality

 varθ(~θm)≥|γ′(θ)|2mI(θ).

In particular, if

is an unbiased estimator of

, .
Furthermore, if , then for all and the following lemma holds.

###### Lemma 2

(Hoeffding 1948). As , is asymptotically normal .

Due to Cramér-Rao inequality, for any unbiased estimator of based on a sample from a population with Fisher information ,

 varθ(γn)≥|γ′(θ)|2nI(θ). (6)

Combining (6) with Lemma 2 leads to a formula for the asymptotic efficiency of :

 aseff(~θ(e)n)=|γ′(θ)|2/I(θ)m2v1(θ). (7)
###### Lemma 3

Let be a random element, with finite Fisher information . If is a (scalar valued) function with differebtiable and , then

 I(θ)≥|μ′(θ)|2σ2(θ). (8)

Proof. Take the projection of the Fisher score into the subspace span of the Hilbert space of functions with with :

 ^Eθ{(J(X;θ)|1,h(X)}=^J(X;θ)=a(θ)(h(X)−μ(θ)). (9)

Multiplying both sides by and taking the expectations results in due to the property of the Fisher score. Hence

 I(θ)=varθ(J(X;θ))≥varθ(^J(X;θ))=|μ′(θ)|2σ2(θ)

which is exactly (8). The equality sign in (8) is attained if and only if with

-probability one the relation

 p′(x;θ)p(x;θ)=a(θ)(h(x)−γ(θ)) (10)

holds for .
From and (8) one gets

 aseff(~θ(e)n)≤1/m2. (11)

Thus, a necessary condition for the asymptotic efficiency of is and by virtue of (4)

 ~θ(e)n(x1,…,xn)=(h(x1)+…+h(xn))/n (12)

for some with .
From Lemma 3 the estimator (12) is an asymptotically efficient estimator of if and only if the relation (10) holds implying that the family is exponential,

 p(x;θ)=exp{A(θ)h(x)+B(θ)+g(x)} (13)

where the functions in the exponent are such that .
From (8) one can see that that the maximum likelihood equation for based on a sample from population (13) is

 (h(x1)+…+h(xn))/n=γ(θ) (14)

and

 ~θ(e)n(x1,…,xn)=(h(x1)+…+h(xn))/n

as the maximum likelihood estimator of is asymptotically efficient.
We summarize the above as a theorem.

###### Theorem 1

Under the regularity type conditions of the theory of maximum likelihood estimators, the jackknife extension estimators are asymptotically efficient if and only if they are arithmetic means based on samples from exponential families.

## 3 Some examples

The jackknife extension lacks innovation. A jackknife extension estimator based on differs from the estimator based on only by the sample sample size. In a sense, it is an extensive vs. intensive use of the data when the main factor is quantity vs. quality.

Nonparametric estimators of population characteristics such as the empirical distribution function, the sample mean and variance are jackknife extensions. Their main goal is to be universal rather than optimal for individual populations. An interesting statistic is the sample median constructed from a sample from a continuous population. Without loss in generality, one may assume

 x1<…

For . If or , one can easily see that

 ~μ(e)n+1=(x′m+1+x′m+2)/2 (15)

where and are the st and nd elements of the sample

. Thus, the median of a sample of an even size is a jackknife extension though one should keep in mind that the definitions of the median in samples of even and odd size are different and it is not clear if the inequality (2) holds.

For the jackknife extension of is not . Let us start with simple cases of and . In the first case,

 ~μ(e)5=1.5x′2+2x′3+1.5x′45

is a weighted average of and its nearest neighbors. The same holds in the second case, with instead of and different weights:

 ~μ(e)7=2x′3+3x′4+2x′57.

It seems likely that the extrapolation to an arbitrary will result in

 ~μ(e)n+1=((m+1)/2)x′m+mx′m+1+((m+1)/2)x′m+2n+1. (16)

Though (16) is a reasonable estimator of the median, it is not clear how it behaves for in small and large samples compared to the standard .

## 4 References

Artstein, S., Ball, K. M., Barthe, F., Naor, A. (2004). Solution of Shannon s problem on the monotonicity of entropy. J. Amer. Math. Soc., 17, 975–982.

Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution.

Ann. Math. Stat., 19, 293–325.

Kagan, A. M., Yu, T., Barron, A., Madiman, M. (2011). Contribution to the theory of Pitman estimators. J. Math. Sci.,199, 2, 202-214.

Madiman, M., Barron, A. (2007). Generalized Entropy Power Inequalities and Monotonicity Properties of Information. IEEE Transactions on Information Theory, 53, 2317–2329.