High Order Adjusted Block-wise Empirical Likelihood For Weakly Dependent Data

12/16/2019 ∙ by Guangxing Wang, et al. ∙ 0

The upper limit on the coverage probability of the empirical likelihood ratio confidence region severely hampers its application in statistical inferences. The root cause of this upper limit is the convex hull of the estimating functions that is used in the construction of the profile empirical likelihood. For i.i.d data, various methods have been proposed to solve this issue by modifying the convex hull, but it is not clear how well these methods perform when the data is no longer independent. In this paper, we consider weakly dependent multivariate data, and we combine the block-wise empirical likelihood with the adjusted empirical likelihood to tackle data dependency and the convex hull constraint simultaneously. We show that our method not only preserves the much celebrated asymptotic χ^2-distribution, but also improves the coverage probability by removing the upper limit. Further, we show that our method is also Bartlett correctable, thus is able to achieve high order asymptotic coverage accuracy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Empirical likelihood methods have been studied extensively in the past three decades as a reliable and flexible alternative to the parametric likelihood. Among its numerous attractive properties, the ones that are most celebrated are the asymptotic distribution of the empirical likelihood ratio and the ability to use Bartlett correction to improve the corresponding confidence region coverage accuracy. However, despite these desirable properties that are at least parallel to the parametric likelihood methods, there is a serious drawback, where the empirical likelihood confidence region has an under-coverage problem in small sample or high dimensional settings. This undesirable feature was noticed in the early works, for example by Owen1988 and Tsao2004. For independent data, various methods have been proposed to address this issue. They can be divided into roughly two main areas. One is to improve the approximation to the limiting distribution of the log empirical likelihood ratio. For this approach, among others, Owen1988 proposed to use a bootstrap calibration and DiCiccio1991 showed that by scaling the empirical likelihood ratio with a Bartlett factor, which can be estimated from the data, the limiting coverage accuracy can be improved from to . Another approach is to tackle the convex hull constraint, which was first studied in Tsao2004. There are three major methods in this approach, namely the penalized empirical likelihood by Bartolucci2007, the adjusted empirical likelihood by Chen2008, and the extended empirical likelihood by Tsao2013. These three methods then have been extended and refined by subsequent research. To mention a few that are related to this paper, Zhang2016 extended the penalized empirical likelihood to fix-b block-wise method to apply on weakly dependent data. Emerson2009 proposed to modify the placement of the extra point in the adjusted empirical likelihood in order to remove the upper bound of the adjusted likelihood ratio statistic, which if not removed will cause a confidence region to cover the whole space in some situations. Liu2010 showed that by choosing the tuning parameter in the adjusted empirical likelihood in a specific way, it is possible to achieve the Bartlett corrected coverage error rate. Chen2013 studied the finite sample properties of the adjusted empirical likelihood and discussed a generalized version of the method proposed in Emerson2009. It is worth pointing out that most of the existing work have focused on independent data, and the aforementioned Zhang2016 was the first paper to address the convex hull constraint for weakly dependent data with penalized empirical likelihood under the block-wise framework, which was introduced to empirical likelihood by Kitamura1997. Recently PiyadiGamage2017

studied the adjusted empirical likelihood for time series models under the frequency domain empirical likelihood framework, which was introduced by

Nordman2006. In this paper, we extend the adjusted empirical likelihood to weakly dependent data under the the block-wise empirical likelihood framework. Hereafter, we call it the adjusted block-wise empirical likelihood (ABEL). Compared to the non-standard pivotal asymptotic distribution obtained in Zhang2016, we show that the ABEL preserves the much celebrated asymptotic distribution. In addition, we show that the tuning parameter can be selected such that the ABEL achieves the Bartlett corrected coverage error rate with weakly dependent data.

This paper is organized as the following. Section 2 gives a brief introduction to the empirical likelihood method and its convex hull constraint problem. Basic notations used throughout the paper are also established in this section. Section 3 introduces the ABEL along with its asymptotic properties. In section 4, we show that the tuning parameter associated with the adjustment can be used to achieve Bartlett corrected error rate for weakly dependent data. In section 5, we demonstrate the performance of the ABEL method through a simulation study and discuss possible ways to calculate the tuning parameter. Proofs of the theorems are presented in section 7.

2 Empirical Likelihood and the convex hull constraint

In this section, we establish notations used in this paper by presenting a brief review of the empirical likelihood methods, the adjusted empirical likelihood and the block-wise empirical likelihood. For a comprehensive review of the empirical likelihood methodology, we refer to Owen2001. Let be i.i.d random samples from an unknown distribution . is the parameter of interest. Let be a -dimensional estimating function, such that , where is the true parameter. One of the advantages of the empirical likelihood is that more information about the parameter can be incorporated through the estimating equations. In other words, we can have . The profile empirical likelihood about is defined as

(1)

Then by a standard Lagrange argument, we have

where is the Lagrange multiplier that satisfies the equation

In the rest of this paper, we write in place of unless the dependency on needs to be explicitly stressed. The profile empirical likelihood ratio is defined as

Under regularity conditions, for example in Qin1994, it can be shown that

(2)

Then an asymptotic empirical likelihood confidence region for can be found as

(3)

(2) and (3) are the most celebrated properties of the empirical likelihood, which are parallel to its parametric counterpart. Despite these advantages, it has been noted early on by Owen1988, that the empirical likelihood confidence region constantly under covers. Tsao2004 studied the least upper bounds on the coverage probabilities, where it focused on the fact that the is finite if and only if is in the convex hull constructed by , and then showed that the empirical likelihood confidence region coverage probability is upper bounded by the probability of the origin being in the convex hull of . Further, Tsao2004 demonstrated that this upper bound is affected by sample size and parameter dimension in such a way that if the parameter dimension is comparable to the sample size, then the upper bound goes to 0 as the sample size goes to infinity. This not only explains the root cause of the under-coverage issue, but also shows the severity of the upper bound problem when the finite sample size is small compared to the parameter dimension. Since then, various researchers tried to address the convex hull constraint directly in order to improve the coverage probability. As mentioned in the introduction, three major approaches have been proposed, and in this paper we will focus on the adjusted empirical likelihood by Chen2008.

The idea of the adjusted empirical likelihood can be most easily demonstrated and understood by considering the two dimensional population mean. That is we have , and . The estimating function then becomes . In this set up, (1) simplifies to

(4)
Figure 1: Convex hull constructed by the original data (left) vs. by the adjusted data (right)

Notice that in (4) is defined if and only if is in the convex hull of . If the hypothesised is not in the convex hull, then there is no solution to (4), and by convention, is set to . As a result, even though when is the true population mean, it will not be included in the empirical likelihood confidence region because for any level . The first plot in Figure 1 shows 15 sample points whose population mean is , which is represented as the red dot. For this sample, using the empirical likelihood defined in (4), is set to . Even though is the true population mean and it is very close to the convex hull, setting provides no information about the plausibility of . In other words, one cannot compare the points and using the non-adjusted empirical likelihood because their likelihood ratio will both be , even though is much closer to the convex hull than is.

To mitigate this problem, Chen2008 proposed to add an extra point , to the original data, and then use the data points to construct the empirical likelihood. They called this the adjusted empirical likelihood,

The intuition of the adjustment can be seen from the plot on the right of Figure 1, where the adjusted convex hull will always contain the origin by design; thus the situation of forcing is avoided. Moreover, it has been shown in Chen2008 that if the hypothesised parameter is close to or in the convex hull, then the adjustment will alter the empirical likelihood by a negligible amount. Thus, at the true population mean, the asymptotic distribution still holds, and a confidence region can be constructed accordingly.

To relax the independence assumption on the data, modifications need to be made to the empirical likelihood method in (1). There are roughly two major approaches. One is the block-based methods in the time domain and the other is the periodogram-based methods in the frequency domain. For a review of these methods, we refer to Nordman2014 and the references therein. In this paper, we use the block-based method introduced by Kitamura1997 to work with weakly dependent data, where we assume that is a sample from a stationary stochastic process that satisfies the strong mixing condition,

(5)

where , and denotes the -algebra generated by . Further, assume

(6)

for some constant . The reason that the empirical likelihood in (1) is inadequate for weakly dependent data is also easily seen by considering the population mean as in (4). The asymptotic distribution for is derived by the approximation , where . For i.i.d data, provides a proper scale to the score , so that is asymptotically distributed. However, for dependent data, is inadequate to scale the score because it does not take the auto-correlations among the data into account. As a remedy, Kitamura1997 proposed to use blocks of data in place of individual data points. To review this blocking method, let be the block length, the gap between block starting points, and the number of blocks respectively, where . Define the block-wise estimating equations as

Then the block-wise empirical likelihood is defined as

(7)

And the log block-wise empirical likelihood ratio is defined as

Under assumptions A.1-A8 in section 3, it can be shown that

(8)

The proof of the above result (8) can be found in Kitamura1997 and Owen2001. It should be noted that the choice of the block length is important to the performance of the block-wise empirical likelihood method. Various authors have studied and proposed ways to select with their respective advantages and limitations. For examples on selecting we refer to Nordman2014, Nordman2013, Kim2013, and Zhang2014. The study of the optimal block choice is however beyond the scope of this paper.

3 Adjusted block-wise empirical likelihood

It is apparent that the block-wise EL method (7) also suffers from the convex hull constraint, which will impede proper coverage probability for finite sample. In this section, we propose to adjust the block-wise empirical likelihood and examine its effectiveness in improving the coverage probability for weakly dependent data. The theoretical appeal of the adjusted empirical likelihood for the i.i.d data is that it preserves the asymptotic distribution and at the same time breaks the convex hull constraint. Moreover, Liu2010 showed that for i.i.d data the adjusted empirical likelihood confidence region coverage probability error can be reduced from to . Furthermore, simulation studies in Chen2008, Emerson2009, and Liu2010 showed that the adjusted empirical likelihood provides significant improvements over the original empirical likelihood in terms of coverage probability. In the rest of this section, we show that all of the desirable properties of the adjusted empirical likelihood method mentioned above are preserved under the adjusted block-wise empirical likelihood for weakly dependent data.

Since the convex hull used in the block-wise empirical likelihood is formed by using the block-wise estimating functions , the extra estimating function used for the adjustment will naturally be constructed from the in contrast to the individual data points used in the i.i.d setting. With this we define the adjustment as

(9)

where and . Now we construct the adjusted block-wise empirical likelihood with as the following

(10)

and the log adjusted empirical likelihood ratio is then

(11)

where

is the vector of Lagrange multiplier that satisfies

Before stating the asymptotic distribution of in (11), we first list the regularity conditions needed. A detailed explanation of these assumptions can be found in Kitamura1997. They are generalizations from the assumptions used in the i.i.d setting, for example in Qin1994, to the weakly dependent setting. The main points of these assumptions are on the continuity and differentiability of the estimating function around the true parameter of interest; so that the remainder terms in the Taylor expansion of the log empirical likelihood ratio are controlled, and that the dominating term converges to a distribution.

  1. The parameter space is compact.

  2. is the unique root of .

  3. For sufficiently small , where is a small ball around , with radius .

  4. If a sequence converges to some converges to except on a null set, which may vary with .

  5. is an interior point of and is twice continuously differentiable at .

  6. .

  7. for defined in the strong mixing condition.
    . and , where is the jth component of .

  8. is of full rank.

With these assumptions, the following theorem then shows that the adjusted empirical likelihood ratio has asymptotic distribution.

Theorem 1.

Assume assumptions A.1-A.8 hold, under the strong mixing conditions (5) and (6), if , then

where is the true parameter.

Similar to the non-adjusted BEL in Kitamura1997, the factor is to account for the overlap between blocks. If the blocks do not overlap, then . The in theorem 1 shows that the size of the tuning parameter is controlled by the block length and sample size . This should be expected since is allowed to grow with and the asymptotic result is obtained under this growing block length setting. Moreover, in the block-wise empirical likelihood, we usually have , where is the number of blocks; therefore, . The intuition is that the adjustment is made on the blocked estimating equations, thus the size of the tuning parameter should be controlled by the number of blocks instead of only by the sample size. By theorem 1, a asymptotic confidence region based on the ABELR can be constructed as:

(12)

By the design of the extra point in (9), it is clear that the is well defined for any . As a consequence, there is no upper bound imposed by the convex hull on the coverage probability of (12).

As with any method that involves a tuning parameter, the choice of in practice is delicate, and it may depend on the statistical task that one wishes to tackle. In the i.i.d setting, Liu2010 studied the choice of through an Edgeworth expansion of the adjusted empirical likelihood ratio, and they found that if is specified in relation to the Bartlett correction factor, then the adjusted empirical likelihood confidence region can achieve the Bartlett error rate. In the next section, we will show that Bartlett correction is also possible for the adjusted block-wise empirical likelihood with weakly dependent data.

4 Tuning Parameter for Bartlett Corrected Error Rate

Being Bartlett correctable is an important feature of the parametric likelihood ratio confidence region, where the coverage probability error can be decreased from to . Like its parametric counterpart, DiCiccio1991 showed that the empirical likelihood for smooth function model is also Bartlett correctable. Further, Chen2007 showed that this property also holds for the empirical likelihood with general estimating equations. For weakly dependent data, Kitamura1997 showed that the block-wise empirical likelihood for smooth function model is Bartlett correctable, where the coverage probability error can be improved from to . The errors being larger than the ones for i.i.d data is due to the data blocking method, which is used to deal with the weakly dependent data structure. In this section, we show that through an Edgeworth expansion of the adjusted block-wise empirical likelihood ratio, a tuning parameter can be found such that the adjusted empirical likelihood confidence region coverage error is for general estimating equations. Here we assume the non-overlapping blocking scheme. In other words, . In addition to the mixing conditions (5) and (6), we also assume that where and are defined in (5) and (6) and is a positive constant. We also need to assume the validity of the Edgeworth expansion of sums of dependent data, which Gotze1983

has shown by assuming the existence of more moments, a conditional Cramer condition, and that the random processes are approximated by other exponentially strong mixing processes with exponentially decaying mixing coefficients that satisfy a Markov type condition. For more details on these assumptions, we refer to

Kitamura1997, Bhattacharya1978, and Gotze1983.

To simplify the notations in deriving the tuning parameter, assume that

where

is the identity matrix, otherwise we can replace

by . Let denote the jth component of . For , define

(13)

Notice that is the Kronecker , where and . Further, we let

With the above notation, it can be shown by following the calculations in Liu2010 and DiCiccio1988 that

(14)

where, for

Here the summation over repeated index is used. Equation (14) is the so called signed-root decomposition of . Since we add an extra blocked estimating equation (9) in the adjusted block-wise empirical likelihood, the signed-root decomposition of will be slightly affected by the adjustment. And this is exactly where we can leverage the tuning parameter to achieve Bartlett corrected coverage error rate. By the derivation shown in section 7, the signed-root decomposition of is

(15)

With defined above, in order to derive the tuning parameter in equation (15) that will yield the Bartlett error rate, we define the counterpart of (13) under dependency as the following: for a sequence of integers ,

Now if we define as follows, then the next theorem will show that the adjusted block-wise empirical likelihood confidence region (12) achieves the Bartlett corrected coverage error rate.

Let

(16)

where, for ,

(17)

with , and similarly for . The quantities are defined as

The are the same as correspondingly, except that the superscript are exchanged, for example .

Theorem 2.

Assume that conditions A.1-A.8 in section 3 hold with non-overlapping blocking scheme. And such that the bounds on , and the assumptions for the Edgeworth expansion for sums of dependent data mentioned in the beginning of this section hold, if is as in (16), then as ,

In practice, the unknown population quantity can be replaced by , where is the maximum block-wise empirical likelihood estimator of . The quantity is composed of various population moments, which can be replaced by their corresponding sample moments to obtain an estimator of . Moreover, the estimated may be positive or negative. If it is positive, then the convex hull constructed with the extra point will always contain the origin. However, if is negative, then is also negative. As a result, the convex hull with the new point will not contain the origin if the original convex hull does not. To avoid the second situation, if , then we add two extra points , such that . We can let and , such that will guarantee that the origin is in the new convex hull. Moreover, since , adding will have the same effect as adding with tuning parameter in terms of obtaining the Bartlett coverage probability.

5 Simulation

In this section, we examine the numerical properties of the adjusted block-wise empirical likelihood through a simulation study. We compare the confidence regions constructed by the adjusted block-wise empirical likelihood (10) with several tuning parameters to the one constructed by the non-adjusted block-wise empirical likelihood (7). The data are simulated from an AR(1) model

where are i.i.d

dimensional multivariate standard normal random variables and

is a diagonal matrix with on the diagonal. The parameter of interest is the population mean . In order to see how the data dependencies affect the performance of the methods, we simulate the data with a range of . In particular, we look at . We also simulate for dimension to see how the parameter dimension affects the performances. We look at two sample sizes and . For each scenario, we calculate the block-wise empirical likelihood ratio at block lengths ranging from to in order to examine the effects of block choices. In addition, we also use the progressive blocking method proposed by Kim2013, which does not require to fix a block-length. For each scenario, we simulate data sets and calculate the likelihood ratio for each data set at the true mean. The coverage probability is then calculated as the number of times the likelihood ratio is less than the theoretical quantile at levels divided by . The likelihood ratios are calculated by the block-wise empirical likelihood without adjustment (BEL), adjusted block-wise empirical likelihood with (ABEL_log), (ABEL_0.5), (ABEL_0.8), (ABEL_1), and given in (16) (ABEL_bart). The Bartlett tuning parameter (16) is estimated by the plug-in estimator, which is then bias corrected by a block-wise bootstrap. The full simulation results are shown in Table 2 in the appendix. Table 1 shows a snapshot of Table 2, where the AR(1) coefficients . is the block length that gives the best coverage rates of each particular method, where indicates that the progressive block method gives the best result. It can be seen that for negative , BEL performed well and at least one of the adjusted BEL matched or surpassed the BEL performance. As becomes positive, the BEL starts to show its vulnerability of under-coverage and this becomes worse as dimension increases. In contrast, the adjusted BEL still provides adequate coverage. This phenomenon where the BEL does not suffer as severe under-coverage for negative as it does for positive exemplifies the fact that the coverage probability is upper bounded by the probability of the convex hull containing the origin. For when is negative, the consecutive points are likely to be on the opposite sides in relation to the origin, therefore the resulting convex hull is likely to contain the origin and does not impose an upper bound on the coverage probability. Whereas, for positive , especially when it is close to , the consecutive points are likely to be close to each other; thus, the probability that the resulting convex hull contains the origin is small.

n= n=
Methods 0.90 0.95 0.99 0.90 0.95 0.99
-0.2 3 BEL 3 0.90 0.94 0.98 6 0.89 0.94 0.99
-0.2 3 ABEL_log 3 0.94 0.97 0.99 pro 0.90 0.96 1.00
-0.2 3 ABEL_0.8 3 0.91 0.95 0.98 7 0.90 0.94 0.99
-0.2 3 ABEL_1 14 0.90 0.95 0.99 7 0.90 0.95 0.99
-0.2 3 ABEL_bart 14 0.91 0.94 0.97 pro 0.90 0.95 0.99
0.2 3 BEL 3 0.82 0.89 0.95 9 0.88 0.93 0.98
0.2 3 ABEL_log 4 0.89 0.95 1.00 6 0.90 0.95 0.99
0.2 3 ABEL_0.8 3 0.83 0.90 0.96 9 0.88 0.94 0.98
0.2 3 ABEL_1 14 0.88 0.95 0.99 9 0.88 0.94 0.99
0.2 3 ABEL_bart 12 0.93 0.96 0.98 8 0.90 0.95 1.00
0.5 3 BEL 5 0.68 0.77 0.89 10 0.82 0.87 0.95
0.5 3 ABEL_log 5 0.89 0.97 1.00 13 0.91 0.96 1.00
0.5 3 ABEL_0.8 16 0.74 0.88 0.97 10 0.83 0.89 0.96
0.5 3 ABEL_1 14 0.87 0.95 0.99 10 0.83 0.89 0.96
0.5 3 ABEL_bart 14 0.92 0.95 0.97 pro 0.90 0.96 0.99
0.5 4 BEL 4 0.64 0.72 0.85 9 0.77 0.85 0.95
0.5 4 ABEL_log 16 0.92 0.94 0.97 11 0.88 0.95 1.00
0.5 4 ABEL_0.8 14 0.72 0.86 0.95 9 0.79 0.87 0.96
0.5 4 ABEL_1 14 0.86 0.92 0.97 9 0.79 0.87 0.96
0.5 4 ABEL_bart 13 0.91 0.94 0.96 pro 0.91 0.97 0.99
0.8 2 BEL 9 0.58 0.67 0.76 16 0.77 0.85 0.94
0.8 2 ABEL_log 7 0.87 0.98 1.00 16 0.88 0.95 1.00
0.8 2 ABEL_0.8 16 0.72 0.86 0.98 16 0.80 0.86 0.94
0.8 2 ABEL_1 16 0.85 0.95 0.99 16 0.80 0.87 0.95
0.8 2 ABEL_bart 4 0.91 0.94 0.97 13 0.90 0.96 1.00
Table 1: Comparison of Coverage Probabilities, is the block length, means progressive blocking method is used.

6 Conclusion

Originally proposed to improve the coverage probability of the empirical likelihood confidence region coverage probability for i.i.d data, the adjusted empirical likelihood in this paper is shown to be effective in improving the coverage probability when combined with the blocking method in dealing with weakly dependent data. In particular, we have shown that the ABEL possesses the asymptotic property similar to its non-adjusted counterpart. Moreover, we have shown that the adjustment tuning parameter can be used to achieve the asymptotic Bartlett corrected coverage error rate of . This tuning parameter that gives the Bartlett corrected rate involves higher moments that needs to be estimated in practice. How to best estimate this tuning parameter needs to be further studied. In the simulation study, we used a block-wise bootstrap to correct the bias in estimating the tuning parameter by plugging in the sample moments. The results show that the adjusted BEL performs comparable to the non-adjusted BEL when the non-adjusted BEL performs well, and it outperforms the non-adjusted BEl when the non-adjusted BEL suffers from the under-coverage issue. Our bootstrap bias corrected tuning parameter performs well most of the time, but sometimes it is outperformed by other choices of the tuning parameter. As mentioned above, the optimal way to estimate the tuning parameter will be addressed in future studies.

7 Proofs

Proof of Theorem 1.

The first step in proving theorem 1 is to show that the Lagrange multiplier is , where we use the subscript to emphasis that this is the Lagrange multiplier for the adjusted empirical likelihood. First, we note that solves the following equation

(18)

Now, define . Multiply on both sides of equation (18), and recall that . Then we have

(19)

where

. By law of large numbers and the central limit theorem, and the argument in

Owen1990 and Kitamura1997, it can be shown that a.s. It has also been shown in Kitamura1997 that and . Then, we can deduce from (19) that

where

 is the smallest eigenvalue of 

S. Then

By the assumption that , we have

Therefore, , which in particular means that .

The next step is to express in terms of . Notice that equation (18) can be written as the sum of two parts

(20)

where the first part on the right hand side can be written as

The last part in (20) is

Now, we have the relationship

(21)

The final step is done through Taylor expansion of the adjusted block-wise empirical likelihood ratio . The adjusted block-wise empirical likelihood ratio can be written in two parts as