 # Jackknife Empirical Likelihood Approach for K-sample Tests

The categorical Gini correlation is an alternative measure of dependence between a categorical and numerical variables, which characterizes the independence of the variables. A nonparametric test for the equality of K distributions has been developed based on the categorical Gini correlation. By applying the jackknife empirical likelihood approach, the standard limiting chi-square distribution with degree freedom of K-1 is established and is used to determine critical value and p-value of the test. Simulation studies show that the proposed method is competitive to existing methods in terms of power of the tests in most cases. The proposed method is illustrated in an application on a real data set.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Testing the equality of distributions from independent random samples is a classical statistical problem encountered in almost every field. Due to its fundamental importance and wide applications, research for the -sample problem has been kept active since 1940’s. Various tests have been proposed and new tests continue to emerge.

Often an omnibus test is based on a discrepancy measure among distributions. For example, the widely used and well-studied tests such as Cramér-von Mises test (), Anderson-Darling ([9, 32]) and their variations utilize different norms on the difference of empirical distribution functions, while some ([2, 23]

) are based on the comparison of density estimators if the underlying distributions are continuous. Other tests (

[35, 12]

) are based on characteristic function difference measures. One of such measures is the energy distance (

[36, 37]). It is the weighted distance between characteristic functions and is defined as follows.

###### Definition 1.1 (Energy distance)

Suppose that and are independent pairs independently from d-variate distributions and , respectively. Then the energy distance between and is

 E(\boldmath{X},\boldmath{Y})=2E∥\boldmath{X}−\boldmath{Y}∥−E∥\boldmath{X}−\boldmath{X}′∥−E∥\boldmath{Y}−\boldmath{Y}′∥. (1)

Let the characteristic functions of and be and , respectively. It has been proved that

 E(\boldmath{X},\boldmath{Y})=cd∫Rd∥ψx(\boldmath{t})−ψy(\boldmath{t})∥2∥\boldmath{t}∥d+1d\boldmath{t},

where is a constant depending on . Clearly, if and only if . A natural estimator of (1), the linear combination of three -statistics, is called energy statistic. Reject if the energy statistic is sufficiently large. To extend to the -sample problem, Rizzo and Székely (

) proposed a new method called distance components (DISCO) by partitioning the total distance dispersion of the pooled samples into the within distance and between distance components analogous to the variance components in ANOVA. The test statistic is the ratio of the between variation and the within variation, where the between variation is the weighted sum of all two-sample energy distances. Equivalently, Dang

et al 

conduced a test based on the ratio of the between variation and the total variation, in which the ratio defines a dependence measure. Although those tests are consistent against any departure of the null hypothesis and are easy to compute the test statistics, the tests have to reply on a permutation procedure to determine the critical values since the null distribution depends on the unknown underlying distributions.

Empirical likelihood (EL) tests ([4, 10, 42]) successfully avoid the time-consuming permutation procedure. As a nonparametric approach, the EL ([24, 25]) also enjoys effectiveness of likelihood method and hence has been widely used, see [27, 28, 40] and the references therein. We refer to [6, 5, 7, 11] for the updates about the EL in high dimensions. When the constraints are nonlinear, EL loses this efficiency. To overcome this computational difficulty, Wood () proposed a sequential linearization method by linearizing the nonlinear constraints. However, they did not provide the Wilks’ theorem and stated that it was not easy to establish. Jing et al. () proposed the jackknife empirical likelihood (JEL) approach. The JEL method transforms the maximization problem of the EL with nonlinear constraints to the simple case of EL on the mean of jackknife pseudo-values, which is very effective in handling one and two-sample -statistics. This approach has attracted statisticians’ strong interest in a wide range of fields due to its efficiency, and many papers are devoted to the investigation of the method.

Recently several JEL tests ([21, 19, 20]) based on characteristic functions have been developed for the two-sample problem. Wan, Liu and Deng () proposed a JEL test using the energy distance, which is a function of three -statistics. To avoid the degenerate problem of statistics, a nuisance parameter is introduced and the resulting JEL method involves three constraints. The limiting distribution of the log-likelihood is a weighted chi-squared distribution. Directly generalizing their JEL test to the -sample problem may not work since the number of constraints increases quickly with . There are constraints, not only casting difficulty in computation but also bringing challenges in theoretical development.

We propose a JEL test for the -sample problem with only constraints. We treat the

-sample testing problem as a dependence test between a numerical variable and a categorical variable indicating samples from different populations. We apply JEL with the Gini correlation that mutually characterizes the dependence (

). The limiting distribution of the proposed JEL ratio is a standard Chi-squared distribution. To our best knowledge, our approach is the first consistent JEL test for univariate and multivariate -sample problems in the literature. The idea of viewing the -sample test as an independent test between a numerical and categorical variable is not new. Jiang, Ye and Liu () proposed a nonparametric test based on mutual information. The numerical variable is discretized so that the mutual information can be easily evaluated. However, their method only applies to univariate populations. Heller, Heller and Gorfine ([13, 14]) proposed a dependence test based on rank distances, but their test requires a permutation procedure.

The reminder of the paper is organized as follows. In Section 2, we develop the JEL method for the -sample test. Simulation studies are conducted in Section 3. A real data analysis is illustrated in Section 4. Section 5 concludes the paper with a brief summary. All proofs are reserved to the Appendix.

## 2 JEL test for K-sample based on a categorical Gini correlation

Let be a sample from -variate distribution respectively. The pooled sample is denoted as of sample size . The objective is to test the equality of the distributions, that is,

 H0:F1=...=FKvs.Ha:Fj≠Fk  for some 1≤j

Let be the categorical variable taking values , and let

with the conditional distribution of given being . Assume . Then the distribution of is the mixture distribution defined as

 F=K∑k=1αkFk.

Treating as an unbiased and consistent estimator of , we can view the pooled sample as a sample from .

By introducing the two variables and , testing (2) is equivalent to testing the independence between and . We will adopt the recently proposed categorical Gini correlation () which characterizes the independence of the continuous and categorical variables.

### 2.1 Categorical Gini correlation

Let and be i.i.d. copies from , and and be i.i.d. copies from . Let

 Δ=E∥\boldmath{X}1−\boldmath{X}% 2∥,  Δk=E∥\boldmath{X}k1−% \boldmath{X}k2∥, k=1,...,K, (3)

be the Gini distance of and , respectively. Then the Gini correlation () between a continuous random variable and a categorical variable is defined as

###### Definition 2.1 (Dang etal.)

For a non-degenerate random vector

in and a categorical variable , if , the Gini correlation of and is defined as

 ρg(\boldmath{X},Z)=Δ−∑Kk=1αkΔkΔ. (4)

The Gini correlation characterizes the dependence. That is, if and only if and are independent. This is because

 S:=Δ−K∑k=1αkΔk=K∑k=1αkE(\boldmath{X}k,\boldmath{X}) =cdK∑k=1αk∫∥ψk(% \boldmath{t})−ψ(\boldmath{t})∥2∥\boldmath{t}∥d+1d\boldmath{t}≥0,

where is a constant depending on , and are characteristic functions of and , respectively. Hence we have the following result,

###### Lemma 2.1 (Dang et al. )

For , if and only if .

Therefore, testing (2) will be equivalent to testing whether . We can rewrite as

 S=K∑k=1αk(E∥\boldmath{X% }1−\boldmath{X}2∥−E∥\boldmath{X}k1−\boldmath{X}k2∥),

which can be estimated unbiasedly by

 Un1,...,nK=K∑k=1^αk(Un−Unk), (5)

where ,

 Un=(n2)−1∑1≤i

and

 Unk=(nk2)−1∑1≤l

Clearly, and are -statistics of degree 2 with the kernel being . and

are unbiased estimators of

and , respectively.

Under , we have . Conversely, . Then and hence . Therefore, Testing is equivalent to testing

 H′0:EUn=EUn1=...=EUnK. (6)

JEL has been proven to be very effective in dealing with -statistics , and therefore we will utilize the JEL approach to test (6).

### 2.2 JEL test for K-sample

In order to apply JEL, we define the the corresponding jackknife pseudo-values for as

 ^Vi=nUn−(n−1)U(−i)n−1, i=1,...,n ^Vkl=nkUnk−(nk−1)U(−l)nk, % l=1,...,nk

where

 U(−i)n−1=(n−12)−1∑1≤j

and

 U(−l)nk=(nk−12)−1∑1≤j

It is obvious to see that

 Un=1nn∑i=1^Vi, Unk=1nknk∑l=1^Vkl,fork=1,...,K.

Under , we have

 E^Vi=E^Vkl=θ0, i=1,...,n; l=1,...,nk; k=1,...,K,

where , with the expectations taking under .

Next, we apply the JEL to the above jackknife pseudo values. Let

be the empirical probability vector assigned to the elements of

, , and be probability vector for . We have the following optimization problem.

 R=max\boldmath{p}k,\boldmath{p},θ{(K∏k=1nk∏l=1nkpkl)(n∏i=1npi)}, (7)

subject to the following constraints

 pkl≥0, l=1,...,nk, nk∑l=1pkl=1, 1≤k≤K; pi≥0, i=1,...,n, n∑i=1pi=1; n∑i=1pi(^Vi−θ)=0; nk∑l=1pkl(^Vkl−θ)=0, k=1,...,K. (8)
###### Remark 2.1

in equation (7) maximizes the squared standard jackknife empirical likelihood ratio (JELR). This is because is the marginal probability and is the conditional probability and then we have . The maximization in is the same maximization solution of the regular JELR.

Applying Lagrange multiplier, one has

 pkl=1nk11+λk(^Vkl−θ), l=1,...,nk, k=1,..,K, pi=1n11+λ(^Vi−θ), i=1,...,n,

where satisfy the following equations:

 n∑i=1^Vi−θ1+λ(^Vi−θ)=0, nk∑l=1^Vkl−θ1+λk(^Vkl−θ)=0, k=1,...,K, λn∑i=1−11+λ(^Vi−θ)+K∑k=1λknk∑l=1−11+λk(^Vkl−θ)=0. (9)

In Lemma 6.3, we have proved the existence of the solutions for the above equations in the Appendix. We denote the solution of (2.2) as . Thus we have the jackknife empirical log-likelihood ratio

 −2logR=−2K∑k=1nk∑l=1log(nkpkl)−2n∑i=1log(npi) (10)

Define and assume

• C1. ;

• C2. and

Note that C1 implies . We have the following Wilks’ theorem.

###### Theorem 2.1

Under and the conditions C1 and C2, we have

 −2logRd→χ2K−1,% as n→∞.

Proof. See the Appendix.

As a special case of the -sample test, the following result holds for .

###### Corollary 2.1

For the two-sample problem, under the conditions C1-C2 and , we have

 −2logRd→χ21,as % n→∞.
###### Remark 2.2

Compared with the result of , the limiting distribution of the proposed empirical log-likelihood ratio is a standard chi-squared distribution. The empirical log-likelihood has no need for multiplying a factor to adjust unbalanced sample sizes.

###### Remark 2.3

Our JEL approach considers energy distance of and , while the JEL method in Wan et al.  utilizes energy distance of between classes and . For , they need to deal with constraints, the number much larger than of ours in (8).

With Therorm 2.1, we reject if the observed jackknife empirical likelihood is greater than , where is the quantile of distribution with degrees of freedom. The -value of the test can be calculated by

 p-value=PH0(χ2K−1>−2log^R),

and the power of the test is

 power=PHa(−2logR>χ2K−1(1−α)).

In the next theorem, we establish the consistence of the proposed test, which states its power is tending to 1 as the sample size goes to infinity.

###### Theorem 2.2

Under the conditions C1 andC2, the proposed JEL test for the K-sample problem is consistent for any fixed alternative. That is,

 PHa(−2logR>χ2K−1(1−α))→1,as n→∞.

Proof. See the Appendix.

## 3 Simulation Study

In order to assess the proposed JEL method for the homogeneity testing, we conduct extensive simulation studies in this section. We compare the following methods.

JEL-S:

our proposed JEL method. R package “dfoptim”  is used for solving the equation system of (2.2).

JEL-W:

the JEL approach proposed in . It is applied only for .

ET:

the DISCO test of . Its null limiting distribution of the test statistic depends on the underlying distribution and hence the test is implemented by the permutation procedure. Function “eqdist.etest” with the default number of replicates in R package “energy” is used .

the Anderson-Darling test of [1, 32]. The procedure “ad.test” in R package “kSamples” is used .

KW:

the Kruskal-Wallis test of

[18, 26] implemented in R package “kSamples”.

HHG:

the HHG test of [13, 14]. The test is performed by a permutation procedure that is implemented in R package “HHG” .

Type I error rates and powers for each method at significance levels and are based on 10,000 replications. The results at significance level are similar to the results at 0.05 level and hence are not presented. We only consider one case of to demonstrate the similarity of our JEL-S and JEL-W. The remaining cases are for without loss of generality. We generate univariate () and multivariate (, ) random samples from normal, heavy-tailed

and asymmetric exponential distributions. In each distribution, samples of balanced and unbalanced sample sizes are generated.

### 3.1 Normal distributions

We first compare our JEL-S with JEL-W, which is also a JEL approach based on energy statistics but designed for the two-sample problem. We generate two independent samples with either equal () or unequal sample sizes () from the

-dimensional normal distributions

and , respectively, where is the -dimensional zero vector,

is the identity matrix in

dimension and is a positive number to specify the difference of scales. The results are displayed in Table 1.

As expected, the JEL-W and our approach perform similarly because both are JEL approach on energy distance to compare two samples. Advantages of the JEL approach over the others in testing scale differences are the same for , which is demonstrated in the following simulation.

Three random samples , and are simulated from normal distributions of , and respectively, where and are positive numbers. The simulation result is shown in Table 2.

In Table 2, the size of tests are given in the rows of and the powers in other rows. We can see that every method maintains the nominal level well. As expected, KW performs badly for scale differences because KW is a nonparametric one-way ANOVA on ranks and it is inconsistent for scale-difference problem. Although ET and AD are consistent, they are less powerful than the JEL method and HHG. The JEL method always has the highest power among the all considered tests.

Next, we consider the location difference case. Three random samples , and are simulated from normal distributions of , and , respectively. Here is the -vector with all elements being 1. The sizes of the tests are reported in the rows of in Table 3 and the others rows provide the powers of the tests.

The Type I error rates of all tests are close to the nominal level. The JEL-S performs the worst with the lowest power in this case, although it is consistent for any alternatives. An intuitive interpretation is that the JEL assigns more weights on the sample points lying between classes and loses power to differentiate classes. The phenomenon of less power in the location-difference problem is also common for the density approach, as mentioned in . For the location difference problem, we suggest to use non-parametric tests based on distribution function approaches. For example, AD and KW tests are recommended.

Our JEL-S has low powers to test location differences, it, however, is sensitive to detect scale-location changes. Three random samples , and are simulated from normal distributions , and , respectively. Here measure the difference of locations and scales. The simulation results are reported in Table 4.

From Table 4, we can have the following observations. For , KW is the least powerful. ET and KW perform similar but worse than HHG and JEL-S. JEL-S has the highest powers. For example, JEL-S is about 20%-30% more powerful than the second best HHG method in the case of . For and , ET performs the worst and JEL-S is the most competitive method.

### 3.2 Heavy-tailed distribution: t(5)

We compare the performance of JEL-S with others in the heavy-tailed distributions. Three random samples , and are simulated from multivariate distributions with 5 degrees of freedom with the same locations and different scales and , respectively. The results are reported in Table 5.

Compared with results of the normal distribution case in Table 2, the power of every method in Table 5

has been impacted by heavy-tailed outliers, while impacts in high dimensions are less than that in one dimension. JEL-S has a slight over-size problem. Its size is 2-3% higher than the nominal level, while its power is uniformly the highest among all methods. For the small difference case with

, JEL-S is 10% more powerful than the second best HHG method.

### 3.3 Non-symmetric distribution: Exponential distribution

Lastly we consider the performance of JEL-S for asymmetric distributions. We generate random samples , and from multi-variate exponential distributions with independent components. The components of each sample are simulated from exp(1), exp and exp, respectively. Type I error rates and powers are presented in Table 6.

From Table 6, we observe that JEL-S suffers slightly from the over-size problem, while the problem becomes less of an issue for higher dimensions. JEL-S performs the best when the differences are small. HHG is inferior to others. Asymmetric exponential distributions with different scales also imply different mean values, and hence KW performs fairly.

### 3.4 Summary of the simulation study

Some conclusions can be drawn across all tables 1-6. HHG is affected by unbalanced sizes the most among all methods. For example, in Table 4, the power of HHG is dropped 13% and 17% for and , respectively, from the equal size to the unequal size case, compared with a 3-5% decrease in other methods.

Considering the same total size, the power in balanced sample is higher than unequal size samples for all tests. All methods share the same pattern of power changes when the dimension changes. For the Normal scale difference cases, powers in are lower than those in and . While for and exponential distributions, powers increase with .

Overall, JEL-S is competitive to the current approaches for comparing -samples. Particularly, JEL-S is very powerful for the scale difference problems and is very sensitive to detect subtle differences among distributions.

## 4 Real data analysis

For the illustration purpose, we apply the proposed JEL approach to a multiple two-sample test example. We apply the JEL method to the banknote authentication data which is available in UCI Machine Learning Repository (



). The data set consists of 1372 samples with 762 samples of them from the Genuine class denoted as Gdata and 610 from the Forgery class denoted as Fdata. Four features are recorded from each sample: variance of wavelet transformed image (VW), skewness of wavelet transformed image (SW), kurtosis of wavelet transformed image (KW) and entropy of image (EI). One can refer to Lohweg

() and Sang, Dang and Zhao () for more descriptions and information of the data.

The densities of each of the variables for each class are drawn in Figure 1. We observe that the distributions of each variable in different classes are quite different, especially for variables VW and SW. The locations of VW in two classes are clearly different. The distribution of SW shows some multimodal trends in both classes. The distribution of KW in Forgery class is more right-skewed than it is in Genuine class. EI of two classes has similar left-skewed distribution. Here we shall compare the multivariate distribution of two classes and also conduct univariate two-class tests on each of four variables.

From Table 7, all tests reject the equality of multivariate distributions of Gdata and Fdata with significantly small -values close to 0. Also the -values for testing separately the individual distributions of VW, SW and KW are small for all methods and thus we conclude that the underlying distributions of those variables are quite different in two classes. For EI variable, however, we do not have significant evidence to reject the equality of the underlying distributions. This result agrees well with the impression from the last graph (d) in Figure 1. In these tests, the -values calculated from JEL approaches are much higher than those calculated from ET, AD, KW and HHG. As expected, our method performs very similar to the JEL-W approach for the two-sample problem.

## 5 Conclusion

In this paper, we have extended the JEL method to the -sample test via the categorical Gini correlation. Standard limiting chi-square distributions with degrees of freedom are established and are used to conduct hypothesis testings without a permutation procedure. Numerical studies confirm the advantages of the proposed method under a variety of situations. One of important contributions of this paper is to develop a powerful nonparametric method for multivariate -sample problem.

Although the proposed -sample JEL test is much more sensitive to shape difference among distributions, it is dull to detect the variation in location when the differences are subtle. This disadvantage probably stems from finding the solution of in equations of (2.2). That is, the within Gini distances and the overall Gini distances are restricted to be the same. This forces the JEL approach weighing more on the observations that are more close to other distributions. As a result, the JEL approach loses some power to detect the difference among the locations. This is a common problem for tests based on density functions. For the location difference problem, distribution function approaches such as AD and KW are more preferred.

Furthermore, the proposed JEL approach is developed based on Euclidean distance, and hence is only invariant under translation and homogeneous changes. Dang () suggested an affine Gini correlation, and we will continue this work by proposing an affine JEL test.

## 6 Appendix

Define ,

 W0n(θ,\boldmath{λ})=1nn∑i=1^Vi−θ1+λ(^Vi−θ), Wkn(θ,\boldmath{λ})=1nnk∑l=1^Vkl−θ1+λk(^Vkl−θ), k=1,...,K, W(K+1)n(θ,\boldmath{λ})=1nK∑k=1λknk∑l=1−11+λk(^Vkl−θ)+λnn∑i=1−11+λ(^Vi−θ).
###### Lemma 6.1 (Hoeffding, 1948)

Under condition C1,

 √nk(Unk−θ0)2σgkd→N(0,1)  as nk→∞,
 √n(Un−θ0)2σgd→N(0,1)  as n→∞.
###### Lemma 6.2

Let and Under the conditions of Lemma 6.1,

• (i) , as

• (ii) , as .

###### Lemma 6.3 (Liu, Liu and Zhou, 2018)

Under conditions C1 and C2 and , with probability tending to one as , there exists a root of

 Wkn(θ,\boldmath{λ})=0, k=0,1,...,K+1,

such that