Depth-based Weighted Jackknife Empirical Likelihood for Non-smooth U-structure Equations

In many applications, parameters of interest are estimated by solving some non-smooth estimating equations with U-statistic structure. Jackknife empirical likelihood (JEL) approach can solve this problem efficiently by reducing the computation complexity of the empirical likelihood (EL) method. However, as EL, JEL suffers the sensitivity problem to outliers. In this paper, we propose a weighted jackknife empirical likelihood (WJEL) to tackle the above limitation of JEL. The proposed WJEL tilts the JEL function by assigning smaller weights to outliers. The asymptotic of the WJEL ratio statistic is derived. It converges in distribution to a multiple of a chi-square random variable. The multiplying constant depends on the weighting scheme. The self-normalized version of WJEL ratio does not require to know the constant and hence yields the standard chi-square distribution in the limit. Robustness of the proposed method is illustrated by simulation studies and one real data application.

Authors

• 5 publications
• 11 publications
• 3 publications
• Bayesian Elastic Net based on Empirical Likelihood

Empirical likelihood is a popular nonparametric method for inference and...
06/18/2020 ∙ by Chul Moon, et al. ∙ 0

• A Weighted Likelihood Approach Based on Statistical Data Depths

We propose a general approach to construct weighted likelihood estimatin...
02/15/2018 ∙ by Claudio Agostinelli, et al. ∙ 0

• Empirical Likelihood Under Mis-specification: Degeneracies and Random Critical Points

We investigate empirical likelihood obtained from mis-specified (i.e. bi...
10/03/2019 ∙ by Subhro Ghosh, et al. ∙ 5

• Jackknife Empirical Likelihood Approach for K-sample Tests

The categorical Gini correlation is an alternative measure of dependence...
08/01/2019 ∙ by Yongli Sang, et al. ∙ 0

• High Order Adjusted Block-wise Empirical Likelihood For Weakly Dependent Data

The upper limit on the coverage probability of the empirical likelihood ...
12/16/2019 ∙ by Guangxing Wang, et al. ∙ 0

• A Robust Bayesian Exponentially Tilted Empirical Likelihood Method

This paper proposes a new Bayesian approach for analysing moment conditi...
12/31/2017 ∙ by Zhichao Liu, et al. ∙ 0

• Improving likelihood-based inference in control rate regression

Control rate regression is a diffuse approach to account for heterogenei...
03/28/2018 ∙ by Annamaria Guolo, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The empirical likelihood (EL) method was first introduced by Owen ([23], [24]

) and has been used heuristically for constructing confidence regions. It combines the effectiveness of likelihood and the reliability of nonparametric approach. On the computational side, it involves a maximization of the nonparametric likelihood supported on data subject to some constraints. If these constraints are linear, the computation of the EL method is particularly easy. However, when applied to some more complicated statistics such as

-statistics, it runs into serious computational difficulties. Many methods are proposed to overcome this computational difficulty, for example, Wood, Do and Broom ([40]) proposed a sequential linearization method by linearizing the nonlinear constraints. However, they did not provide the Wilks’ theorem and stated that it was not easy to establish. Jing, Yuan and Zhou ([11]) proposed the jackknife empirical likelihood (JEL) approach. It transforms the maximization problem of the EL with nonlinear constraints to the simple case of EL on the mean of jackknife pseudo-values, which is very effective in handling one and two-sample -statistics. Since then, it has attracted strong interests in a wide range of fields due to its efficiency, and many papers are devoted to the investigation of the method, for example, Liu, Xia and Zhou ([20]), Peng ([26]), Feng and Peng ([6]), Wang and Zhao ([38]), Wang, Peng and Qi ([39]), Li, Xu and Zhou ([17]), Li, Peng and Qi ([16]), Sang, Dang and Zhao ([30]

) and so on. In many nonparametric and semiparametric approaches, such as the Gini correlation, quantile regression and rank regression, the parameters of interest are estimated by solving equations with

-statistic structure instead of directly by -statistics. Thus, the JEL in Jing, Yuan and Zhou ([11]) can not be applied directly. Li, Xu and Zhou ([17]) extended the JEL to the more complicated but more general situation. The Wilks’ theorems are established even for the situation in which nuisance parameters are involved.

As the EL method is sensitive to outliers and the EL confidence regions may be greatly lengthened in the directions of the outliers (Owen[25], Tsao and Zhou[36]), the JEL method with equation constraints is sensitive to outliers. That is, the JEL method is not robust. For the EL approach, a number of methods have been proposed to achieve robustness, see Wu ([42]), Glenn and Zhao ([8]), Jiang ([12]). Those robust empirical likelihood (REL) methods tilt the EL function by assigning smaller weights to outliers, which yield a more robust estimator and confidence region. Jiang ([12]) linked the depth-based weighted empirical likelihood (WEL) with general estimating equations and produced a robust estimation of parameters. They constructed weights based on a depth function although it is not the spatial depth as they claimed. Data depth provides a centre-outward ordering of multi-dimensional data. Points deep inside the data are assigned with a high depth and those on the outskirts with a lower depth. In the literature, depth functions have been extensively studied, for example, Mahalanobis depth ([28]), simplicial depth ([18]), half-space depth ([37]), spatial depth ([32]) and projection depth ([46]). In this paper, we propose a weighted JEL (WJEL) incorporating the depth-based weights in the JEL approach to solve complicate problems with estimating -statistic structure equations in order to gain robustness of the JEL procedure. There is no smoothness assumption on the kernel function of U-statistic structure in the sample space. Rather, the smoothness on the parameter is required. The asymptotic distribution of WJEL ratio is established. It converges in distribution to a multiple of a chi-square random variable. The multiplying constant depends on the weighting scheme. The self-normalized version of WJEL ratio does not require to know the constant and hence yields the standard chi-square distribution in the limit. The proof of the limiting distribution of the WJEL is quite technically involved since the procedure has to deal with weak dependence of jackknife pseudo values and in the same time to deal with uneven weights.

The remainder of the paper is organized as follows. In Section 2, we develop a weighted JEL (WJEL) method for estimating non-smooth -statistic structure equations. In Section 3, simulation studies are conducted to compare our WJEL methods with the JEL methods. A real data analysis is illustrated in Section 4. Section 5 concludes the paper with a brief summary. All the proofs are reserved to the Appendix.

2 Methodology

2.1 JEL with U-statistic structure equations

Suppose that ’s () are independently distributed from an unknown distribution with a -dimensional parameter . can be estimated by solving the -structure equations (Li, Xu and Zhou [17])

 Wn,l(\boldmath{θ})=[k!(n−k)!]/n!∑1≤i1<..

where is symmetric in the ’s. For each fixed , is a standard -statistic with kernel for . Let and . We define the jackknife pseudo-values by

 ^\boldmath{V}i(\boldmath{θ})=n\boldmath{W}n−(n−1)\boldmath{W}(−i)n−1, (1)

where is calculated on the sample of data values from the original data set after the observation is deleted. It has been proved that those jackknife pseudo-values are asymptotically independent ([34]), and the average of those jackknife pseudo-values provides a unbiased and consistent estimator of . Therefore, the standard empirical likelihood can be established on those pseudo-values instead of the original observations as follows. The -type empirical likelihood (UEL) at is given by

 L(\boldmath{θ})=max{n∏i=1pi:n∑i=1pi=1,n∑i=1pi^\boldmath{V}i(%\boldmath$θ$)=\boldmath{0}}. (2)

Using Lagrange multipliers, when is in the convex hall of ,

 pi=1n11+\boldmath{λ}T(\boldmath{θ})^\boldmath{V}i(\boldmath{θ}),

where are the Lagrange multipliers that satisfy

 1nn∑i=1^\boldmath{V}i(\boldmath{θ})1+\boldmath{λ}T(\boldmath% {θ})^\boldmath{V}i(\boldmath{θ})=%\boldmath$0$.

Under some mild conditions listed in Li, Xu and Zhou ([17]), the Wilks’ theorem holds for the -type empirical likelihood ratio, that is, as ,

 −2logR(\boldmath{θ}0)=2n∑i=1log{1+% \boldmath{λ}T(\boldmath{θ}0)^\boldmath{% V}i(\boldmath{θ}0)}d→χ2r,

where is the true value of .

In many nonparametric and semiparametric approaches, the parameters of interest are estimated by solving equations with -statistic structure instead of directly by -statistics.

Example 2.1 (Gini correlation)

Gini correlation, as an alternative measure of dependence, can be estimated by solving non-smooth and -structured estimating functions (Sang, Dang and Zhao [30]). Specifically, suppose and are two non-degenerate random variables with continuous marginal distribution functions and

, respectively, and a joint distribution function

, then two Gini correlations are defined as (Blitz and Brittain [1], Schechtman and Yitzhaki [44])

 γ1:=γ(X,Y)=cov(X,G(Y))cov(X,F(X))and γ2:=γ(Y,X)=cov(Y,F(X))cov(Y,G(Y)). (3)

Given an i.i.d. data set with , the two Gini correlations can be estimated by a ratio of two -statistics

 ^γ1=U1U2=2/[n(n−1)]∑1≤i

where and . Let and with

Then from Sang, Dang and Zhao ([30]), we have the following -structure equations,

 \boldmath{W}n(\boldmath{γ})=2n(n−1)∑1≤i

Note that and are non-smooth with respect to the sample space since they involve the indicator function in and the absolute value function in . However, it is differentiable with respect to the parameter.

Example 2.2 (Gini index)

Gini index has been widely used in Economics for assessing distributional inequality of income or wealth ([9]). It can be estimated by solving non-smooth and -structured estimating functions ([38]). Let and be a independent pair of random variables from . Then the Gini index of can be defined as follows,

 GI=E|X−X′|2EX.

Given an i.i.d. data set , a natural estimator for the Gini index is a ratio of two -statistics with the kernels and ,

 ˆGI=U1U2=(n2)−1∑1≤i

Let . The Gini index can be estimated by

 Wn(GI)=2n(n−1)∑1≤i

which is a -structure equation. Clearly, is non-smooth with respect to or , but is smooth with respect to the parameter .

Remark 2.1

The parameters in Example 1 and 2 are estimated by the ratios of the

-statistics, and they are biased. Using the theorem on a function of U-statistics, the limiting normality of the estimators can be established, but the JEL approach can avoid estimating their asymptotic variances. Secondly, the JEL approach performs better in finite samples, especially in small samples. This is demonstrated empirically in

[30]. The weighted JEL is proposed in this paper with a goal to improve the robustness of the JEL.

2.2 Weighted JEL with U-statistic structure equations

In order to reduce the influence of outliers, we propose a robust JEL by defining a weighted JEL as follows.

Definition 2.1

Suppose that () are independent distributed from an unknown distribution with a -dimensional parameter . Assume that

is the probability mass placed on

. Given a weight vector

with and , the weighted jackknife empirical likelihood (WJEL) for parameter is then defined as

 \mbox{WJEL}(\boldmath{θ})=sup{n∏i=1pnωnii:n∑i=1pi=1,n∑i=1pi^\boldmath{V}i(\boldmath{θ})=\boldmath{0}}, (5)

where are the jackknife pseudo values defined in (1).

Remark 2.2

For the equal weights , the WJEL defined as (5) is reduced to the classical JEL in (2).

Remark 2.3

The parameter in (5) is not directly related with the weight vector that is given or specified. However, since the WJEL is related with both the parameters and the weights, maximizing the jackknife empirical log-likelihood ratio brings an indirect connection between the parameter and weights.

We defer the choice of for robustness of the WJEL to the end of this section, but focus on the solution of (5) and its asymptotic property first. We want to maximize subject to restrictions

 pi≥0,n∑i=1pi=1,n∑i=1pi^\boldmath{V}i(\boldmath{θ})=\boldmath{0}.

For any given , if is in the convex hull of points , then a unique maximum exists and it can be found by Lagrange multipliers as follows,

 pi=ωni1+\boldmath{λ}T^\boldmath{V}i(\boldmath{θ}), (6)

where can be determined in terms of by

 n∑i=1ωni^\boldmath{V}i(\boldmath{θ})1+\boldmath{λ}T^% \boldmath{V}i(\boldmath{θ})=0. (7)

We can rewrite the WJEL function for as

 L(\boldmath{θ})=n∏i=1⎧⎨⎩ωni1+\boldmath{λ}T^\boldmath{V}i(\boldmath{θ})⎫⎬⎭nωni.

Note that the unrestricted empirical likelihood is maximized at because the Kullback-Leibler (KL) divergence defined as , the equality if and only if ([14]). Then the corresponding robust jackknife empirical likelihood and robust jackknife empirical log-likelihood ratio, respectively, are

 R(\boldmath{θ})=n∏i=1(piωi)nωni=n∏i=1⎧⎨⎩11+% \boldmath{λ}T^\boldmath{V}i(\boldmath{θ})⎫⎬⎭nωni

and

 l(\boldmath{θ})=−2logR(\boldmath{θ}% )=2n∑i=1nωnilog{1+\boldmath{λ}T^\boldmath{V}i(\boldmath{θ})}. (8)

From (8), it is easy to see that the weight is not assigned to the jackknife pseduo value but to the empirical log-likelihood term . We can minimize in (8) to obtain an estimator of , which is named as the WJEL estimator. We also have the following asymptotic result for the robust jackknife empirical log-likelihood ratio.

Theorem 2.1

Let be the true value of and . Under some mild regularity conditions stated in the Appendix, the Wilks theorem holds for the -type WJEL ratio,

 l(\boldmath{θ}0)d→cχ2rasn→∞ (9) l(\boldmath{θ}0)∑ni=1nω2nid→χ2rasn→∞. (10)

The self-normalized result in (10) is more applicable since it does not require to know value. Its proof immediately follows from an application of Slutsky’s Theorem to (9), and hence we only provide a proof of (9) in the Appendix.

The above procedure can also be adapted to handle nuisance parameters by profiling the empirical likelihood. Write , where is the parameter of interest and is an unknown nuisance parameter. The profile WJEL ratio is defined as

 l(\boldmath{α})=l(\boldmath{α},~\boldmath{β})=min\boldmath{β}l(% \boldmath{α},\boldmath{β}). (11)

That is, we minimize the WJEL ratio over the nuisance parameters for each fixed .

Theorem 2.2

Under some mild regularity conditions stated in the Appendix, the Wilks’ theorem holds for the -type profile WJEL ratio and its self-normalized version,

 \boldmath{l}(\boldmath{α}0)d→cχ2pasn→∞, (12) l(\boldmath{α}0)∑ni=1nω2nid→χ2pasn→∞. (13)

where is the true value of the parameter of interest and is the same as in the Theorem 2.1.

A proof for (12) of Theorem 2.2 is reserved in the Appendix. The above results are obtained under a given weight vector . In order to achieve the robustness with JEL, the weight for an outlier should be small. For this propose, a proper weight scheme can be assigned by depth functions (Zuo and Serfling [47], and Dang, Serfling and Zhou [4]).

2.3 Depth-based weights

Depth functions play important roles in robust and nonparametric multivariate analysis and inference (Liu

[19], Zuo and Serfling [47]). Let

be a probability distribution on

. An associated depth function provides a center-outward ordering of point , higher values representing higher “centrality” of . For a data set with the empirical distribution , we will denote the sample version by , which assigns points deep inside the data with a high depth and those on the outskirts with a lower depth.

Among popular types of depth functions, we use the spatial depth for its nice properties in good balance between robustness and computational ease (Dang and Serfling [5]). The spatial depth function is defined as

 D(\boldmath{x};F)=1−∥EF\boldmath{S% }(\boldmath{x}−\boldmath{X})∥, (14)

where is the multivariate sign function with if and if . Accordingly, its sample counterpart is

 D(\boldmath{x},Fn)=1−∥∥ ∥∥1nn∑i=1% \boldmath{x}−\boldmath{X}i∥\boldmath{x}−% \boldmath{X}i∥∥∥ ∥∥.

It is easy to check that in the univariate case, the spatial depth is and the sample spatial depth is with the maximum spatial depth 1 at the median.

Now we are ready to assign a weight to by

 ωni=D(\boldmath{X}i;Fn)∑nj=1D(\boldmath{X}j;Fn). (15)
Remark 2.4

When the data set is multimodal, it is suggested to use the kernalized spatial depth (KSD), which generalizes the spatial depth via a positive definite kernel to capture the local structure of the data cloud [2].

Note that Jiang et al. ([12]) used a different depth although they called it as the spatial depth function. The depth they used is defined as , which is not robust in terms of unbounded influence function and 0 breakdown point. Nevertheless, we can use Theorem 3.1 of Jiang et al. [12], from which the constant in Theorem 2.1 or Theorem 2.2 can be determined by equation (16).

Theorem 2.3

If and . Then , , and as ,

 n∑i=1nω2nia.s.⟶c=∫D2(\boldmath{x};F)dF(% \boldmath{x})(∫D(\boldmath{x};F)dF(\boldmath{x% }))2. (16)

A direct application of the Jensen’s inequality on a non-degenerate distribution proves that the spatial depth satisfies the conditions of and .

Remark 2.5

The spatial-depth based weights provided in this section are data driven for the robustness purpose. This choice of weights may not satisfy the assumption of Theorem 2.1 and Theorem 2.2, in which the weights are required to be given and to be deterministic. However, the WJEL based on spatial-depth weights, (15) works well in the simulations studies, and definitely calls for a theoretic development in the future research.

3 Simulation

In the first part of this section, a small simulation study is conducted to compare WJEL and JEL methods for inference of Gini correlation.

• Data are generated from a normal distribution

with contaminating distribution , where

 F1∼N((0,0)T,\boldmath{Σ}),F2∼N((0,0)T,\boldmath{4}Σ)

with and being the correlation coefficient. Without loss of generality, we consider only cases of with . We take two contamination levels: and . For each level, we generate 1000 samples of two different sample sizes () from the mixture of and . For each simulated data set, confidence intervals are calculated using different methods. The coverage probability and interval length can be computed from 1000 samples. Then we repeat this procedure 10 times. The average coverage probabilities and average lengths of confidence intervals as well as their standard deviations (in parenthesis) are presented in Table 1.

From Table 1, the WJEL method is more robust than the JEL method. For the small size (), the JEL performs worse than the WJEL even under the uncontaminated case. The WJEL keeps well the coverage probability while the JEL suffers a slight under-coverage problem. For without outlier contamination, the JEL performs better than WJEL that produces slightly lower coverage probabilities. In the case of contamination, the WJEL always has higher coverage probabilities than JEL for all sample sizes. For a large sample size with or , WJEL also yields shorter confidence intervals than JEL.

• Data are generated from heavy-tailed distributions. To be specific, we generate 1000 samples of two different sample sizes () from Kotz distribution with the scatter matrix as before. The Kotz type distribution is a bivariate generalization of the Laplace distribution with the tail region fatness between that of the normal and distributions. The results based on 10 repetitions are presented in Table 2.

From Table 2, under the heavy-tailed distributions, the JEL method suffers the under-coverage problem especially when the sample size is relatively small (). Compared with the JEL approach, the WJEL approach has better coverage probabilities which are very close to the nominal level with slightly larger average lengths of confidence intervals for both small and large sample sizes. Overall, the WJEL approach performs better than the JEL method in the heavy-tailed distributions. The WJEL overcomes the limitation of sensitivity of the JEL to outliers.

The second part of the simulation studies is for comparing the JEL and WJEL methods when they are applied to Gini index. We simulate data from asymmetric Pareto distributions, Pareto(), where is the scale parameter and is the shape parameter, respectively. Using the results of Gini mean difference in [45], we have the true values of Gini index as follows,

 GI(Pareto(θ,β))=12β−1,for all θ>0 and β>1.

The Gini index of Pareto distribution is independent of the scale parameter . The number of samples and the number of repetitions are the same as before. The simulation results are reported in Table 3.

From Table 3, both the JEL and WJEL suffer the under-coverage problem, although the WJEL performs uniformly better than the JEL. The problem is more severe for the small sample size and small values of . This is understandable since the smaller is, the heavier tail of the Pareto distribution is. The weighted JEL improves the JEL, but it does not reach the nominal level under the small sample size. As expected, the scale parameter has no impact on the inference of the Gini index.

4 Real data analysis

For the purpose of illustration, we apply the proposed RJEL method to the gilgai survey data (Jiang [12]). The data set consists of 365 samples, which were taken at depths 0-10, 30-40 and 80-90 cm below the surface. Three features, pH, electrical conductivity (ec) in mS/cm and chloride content (cc) in ppm, are measured on a 1:5 soil:water extract from each sample. Without loss of generality, we consider the Gini correlations between electrical conductivity and chloride content at different depths. We use e00 (0-10 cm), e30 (30-40 cm), e80 (80-90 cm), and c00 (0-10 cm), c30 (30-40 cm) and c80 (80-90 cm) to denote ec and cc at different depths, respectively .

The density curves and boxplots of ec and cc at different depth levels are drawn in Figure 1

. We observe that the distributions of each variable at different depths are quite different. The range and variation of each variable increase as the depth increases. At the same depth, two features have a similar distribution although their scales are different. Those distributions are positively skewed, indicating that there are a quite number of outliers in two features at the 0-10 cm depth and few outliers at the 30-40cm depth. But e00 contains no outlier.

The point estimates and confidence intervals for Gini correlations between three pairs (e00, c00), (e30, c30) and (e80, c80) are calculated and reported in Table 4. We compare the RJEL with the other two methods, namely, JEL and VJ. The VJ method is the inference method based on asymptotical normality with the asymptotical variance estimated by the jackknife method. Correlations of ec and cc are significantly different at different depth levels. At the 30-40cm depth, the correlation is the highest between electrical conductivity and chloride content, while at the 80-90cm depth, the correlation between them is the lowest, decreasing from 0.97 to 0.73.

For (e00, c00), confidence intervals of and

are disjoint by all three methods, indicating that e00 and c00 is not exchangeable up to a linear transformation (Yitzhaki and Schechtman

[44]). With a large number of outliers presented in e00 and c00, performance of the JEL is largely degraded and is even worse than that of the VJ. The proposed RJEL overcomes the sensitivity of JEL to outliers and performs the best with the shortest confidence intervals. We keep more decimals in point estimates to see that the upper limits of the confidence intervals of the RJEL are slightly different with those point estimates. Indeed, that is one of appealing properties of the empirical likelihood approach: confidence intervals (regions) are entirely determined by data.

For (e30, c30), the point estimates of and are very close and their confidence intervals are largely overlapped. Again, JEL performs the worst since both e30 and c30 contain few outliers.

For (e80, c80), from the boxplots of Figure 1, we know there are a couple of outliers in c80 but none in e80. Since is defined as the covariance between e80 and the probability distribution of c80, the outliers in c80 do not impact JEL, and hence JEL performs better than RJEL. However, for inference of , JEL is affected by those outliers and RJEL is necessary. As a result, RJEL provides a much shorter confidence interval than the other two methods.

5 Conclusion

In this paper, we have explored a robust JEL method for problems with -statistic structure equations. The RJEL is proposed to tilt the JEL function by assigning smaller weights to outliers with the weights being proportional to their spatial depth values. Hence it is more robust than the regular JEL developed by Li, Xu and Zhou ([17]). Its robustness is demonstrated in the simulation and real data application on inference of Gini correlation. On the other hand, the asymptotic results of the robust JEL is the same as that of the robust EL (Jiang ([12])), in which only general equation constraints are considered. The proof of the asymptotic distribution of the RJEL is quite technically involved since one has to deal with weak dependence of jackknife pseudo values and in the mean time to deal with unequal weights, especially when the procedure involves the parameters of interest and nuisance parameters.

Continuations of this work could take the following directions:

• In this paper, we use spatial depth function to assign weighs. How to assign weight can be explored more in the further research.

• Reduce the computation of -type profile empirical likelihood. Li, Peng and Qi ([16]) and Peng ([26]) considered procedures based on a jackknife plug-in empirical likelihood to save the computation time. We may develop similar procedures to deal with -structured empirical likelihood using the robust JEL.

6 Appendix

Compared with Li, Xu and Zhou ([17]), we need to deal with the uneven weight . For simplicity, we only prove Theorem 2.2 for . The case for general and Theorem 2.1 can be proved similarly. We adopt the notations from Li, Xu and Zhou ([17]) as below

1. : the asymptotic variance-covariance matrix of with elements

2. where is a sequence of non-negative real numbers converging to 0 as , and throughout this paper, we use some arbitrary unless otherwise specified

3. ,

where is determined by (7) and satisfies

 E{\boldmath{g}(X1,β)1+\boldmath{ξ}T\boldmath{g}(X1,\boldmath{β% })}=\boldmath{0}. (17)

Define the matrix

 V=(V11V12VT120), (18)

where

 V11=\boldmath{Σ}(\boldmath{β}0)r×r,V12=−−∂∂\boldmath{β}TE(\boldmath{H}(X1,X2;\boldmath{α}0,\boldmath{β}0)).

By the Hoeffding decomposition,

 Wn,l(\boldmath{β})=τl(\boldmath{β})+2nn∑i=1ϕl(Xi,\boldmath{β})+2/[n(n−1)]∑i

Simple calculations give

 ^Vil(\boldmath{β}) =τl(\boldmath{β})+2ϕl(Xi,% \boldmath{β})+2n−1∑j=1,j≠iψl(Xi,Xj,%\boldmath$β$ −2/[(n−1)(n−2)]∑i1

Note that for each ,

 E(R2ni,l(\boldmath{β}))≤Cn−1E(ψ2l(X1,X2,\boldmath{β}% ))+Cn−2E(ψ2l(X1,X2,\boldmath{β}))→0,

where is some generic constant. So, , which implies that

We will need the following regularity conditions.

(C0) The true parameter is uniquely determined by