Testing for homogeneity, that is testing for the equality of several probability distributions is an old and important problem in statistics. When the number of these distributions is greater than two, it is named the -sample problem and has been tackled in the literature under different approaches. For instance, the traditional Kolomogorov-Smirnov, Cramér-von Mises and Anderson-Darling tests (,), initially introduced to treat the case of two distributions only, have been extended for dealing with the aforementioned -sample problem (,,). Also, procedures based the likelihood ratio and which led to more powerful tests than the previous ones were introduced in . Nevertheless, all these methods just permit to test the equality of distributions defined on , where is the Borel -field associated to , and cannot be used for distributions defined on more complex spaces. The interest of kernel-based methods, that is methods based on the use of reproducing kernel Hilbert spaces embeddings, relies on the fact that they permit to deal with high-dimensional and structured data (), which the aforementioned traditional methods do not do. In this vein, Harchaoui et al.  and, more recently, Gretton et al.  proposed kernel-based methods for the two sample problem. The former introduced a method based on the maximum Fisher discriminant ratio while the latter used the maximum mean discrepancy. The extension of their procedures to the case of more than two distributions is of a great interest since, to the best of our knowledge, it it has never been done.
In this paper, we deal with the -sample problem by extending the kernel-based approach of Harchaoui et al. . The rest of the paper is organized as follows. In Section 2, we recall some basic facts about the reproducing kernel Hilbert spaces embeddings. In Section 3, after specifying the testing problem that we deal with, we introduce a test statistic and derive its asymptotic distribution under the null hypothesis. We also tackle computational aspects that show how to compute this test statistic in practice. Section 4 is devoted to the presentation of simulations made in order to evaluate performance of our proposal and to compare it with known methods. All the proofs are postponed in Section 5.
2 Preliminary notions
In this section, we recall the notion of reproducing kernel hilbert space (RKHS) and we just define some elements related to it that are useful in this paper. For more details on RKHS and its use in probability and statistics, one may refer to .
Letting be a measurable space, where is a metric space and is the corresponding Borel -field, we consider a Hilbert space of functions from to , endowed with an inner product . This space is said to be a RKHS if there exists a kernel, that is a symmetric positive semi-definite function , such that for any and any , one has and . When is a RKHS with kernel , the map characterizes since one has
for any . It is called the feature map and it is an important tool when dealing with kernel methods for statistical problems. Throughout this paper, we consider a RKHS with kernel satisfying the following assumptions:
the RKHS associated to the kernel is dense in where is a probability measure on .
be a random variable taking values inand with probability distribution . If , the mean element associated with is defined for all functions as the unique element in satisfying,
Furhermore, if , we can define the covariance operator associated to as the unique operator from to itself such that, for any pair , one has
It is very important to note that if is satisfied, then the mean element and the covariance operator are well-defined. They can also be expressed as
is the tensor product such that, for any pair, is the linear map from to itself satisfying for all . The empirical counterparts of and , obtained from a i.i.d. sample of , are then given by:
3 The -sample problem
In this section, we specify the -sample problem that we deal with, as a test for hypotheses that are given. Then, a test statistic is proposed and its asymptotic distribution under the null hypothesis is derived. Finally, we deal with computational aspects and show how the introduced test statistic can be computed in practice.
For such that , we consider probability distibutions on . For , we denote by and by the mean element and the covariance operator, respectively, associated to . The -sample problem that we deal with is the test for the hypothesis : against the alternative given by : , .
3.1 Test statistic
For , let be an i.i.d. sample in with commmon distribution . We consider the statistics
from which we define
where . Let be a sequence of strictly positive numbers such that . Then, we consider
where denotes the identity operator of , and we take as test statistic for the -sample problem the statistic:
3.2 Asymptotic distribution under
We consider the following assumptions:
For , one has , where is a real belonging to .
the eigenvaluessatisfy for ;
there are infinitely many strictly positive eigenvalues of for .
Then, we have:
Assume () to () and that , then under , converges in distribution, as , to .
3.3 Computation of the test statistic
For computing this test statistic in practice, the kernel trick () can be used as it was already done in  for twe two-groups case. For , we consider the operator from to represented in matrix form as
Then put , and consider
and the Gram matrix
and the vector
where . Clearly,
and, as in , . Therefore
and . Using the matrix inversion lemma, as in , we obtain
where . Hence
and using the property , we finally obtain
4 Power comparison by Monte Carlo simulation
In this section, the empirical power of the proposed test is computed through Monte Carlo simulations and compared to that of tests introduced by Zhang and Wu  which are based on statistics denoted by , and obtained from the likelihood-ratio test statistic and shown to be more powerful than the classical Kolmogorov-Smirnov, Cramér-von Mises and Anderson-Darling
-sample tests. We estimate the powers of our test and the three aforementioned tests in the following cases ():
For all tests we take the significance level and the empirical power is computed over independent replications. For our test, we used the gaussian kernel , and computed the test statistic as indicated in Section 3.3 by taking
The results are given in Figures 1 to 4 that plot the empirical power versus the total sample size . They show that our test outperforms the three tests of Zhang and Wu  in all cases.
5.1 Preliminary results
In this section, we give some results that are necessary for proving Theorem 3.1.
Assume (), () and () . Then, putting , we have .
Proof. Let be an orthonormal basis of
consisting of eigenvectors ofsuch that is associated to the -th eigenvalue . Using Lemma 21 in  and the equality , we obtain
which proves that .
The following lemma gives an asymptotic approximation of the test statistic.
Assume (), () and (). If , then
Using the central limit theorem, we havefor all , and since it follows from (7) that . The fact that permits to deduce from (6) that Moreover,
Next, using the upper-bound and Lemma 5.1,