1 Introduction
Mean vector testing in high dimension is gaining great attention in the recent past with increasing availability of data sets where the number of variables is greater than the sample size. The traditional Hoteling’s test and the more recently developed test statistics assume that the observations are independently and identically distributed. Comparison of mean vector for distributions where the samples are dependent is a relatively understudied problem. The one-sample mean vector test under dependence is defined as follows. The observations are assumed to be -dimensional random vectors with mean and a covariance matrix that may not be diagonal. The problem of interest is then to test
(1) |
In the two-sample case, and are two independent groups of -dimensional observations with mean vectors , and covariance matrices , respectively. The hypothesis of interest is that the two population means are equal, viz.
(2) |
In (Ayyala et al., 2017), we developed a hypothesis test for (1) when the samples follow an -dependent stationary Gaussian process with mean and autocovariance structure given by . The rate of increase with respect to was assumed to be linear for () and sublinear for (
). These assumptions ensure that the number of variables is not increasing faster than the sample size and that there are sufficiently many observations for estimating the autocovariance at all lags. The dependence structure is made sparse by assuming
for any where and the matrix is the covariance of multiplied by the sample size, i.e., . Summarizing the model, the assumptions are(3) |
The test statistic is based on the Euclidean norm of the sample average, . Define where
is an unbiased estimator for
. Under , this quantity has expected value . Using this property, the test statistic is constructed as(4) |
where is a ratio-consistent estimator for , i.e., . For expressions of and , refer to Ayyala et al. (2017)
. Under the null hypothesis, the test statistic is shown to be asymptotically normal. When we define
if , the power function at significance level under a local alternative is derived as(5) |
as where
is the cumulative distribution of standard normal distribution.
The mean in the local alternative satisfies the condition for all .
In the proof of asymptotic normality of , we used a two-dimensional array argument. The main idea was to divide the array into blocks which were at least
indices apart. The proof was established by showing that (a) these blocks dominate the remainder of terms and (b) the blocks are independent and hence central limit theorem can be used by verifying Lyapunov conditions. However upon further inspection, we see the blocks are not independent, albeit being uncorrelated. Also, we have identified convergence issues for asymptotic power under the rate of increase of
assumed in (3). The main result and numerical studies of Ayyala et al. (2017) remain valid and provide the justification of the use of the test statistic. However, some of the theoretical details of the paper need to be revised.In this article, we address these issues and provide corrected proofs for the results in Ayyala et al. (2017) as well as the further discussion on the power of the test. The remainder of the paper is organized as follows. In Section 2, a corrected proof for asymptotic normality for the one-sample test statistic is provided. Using the corrected proof, the result for the two-sample case is verified in Section 3. Asymptotic power of the test statistic for the one sample case is presented in Section 4. The notation used in the remainder of the article is made to be consistent with Ayyala et al. (2017). For definitions and additional details on the variables defined, kindly refer to Ayyala et al. (2017).
2 Proof of theorem 3.1 of Ayyala et al. (2017)
In the proof of theorem 3.1 of Ayyala et al. (2017), the authors claim that the variable defined for all (formal definition of is given in Section 2.2) are mutually independent since the process is Gaussian with -dependent stationarity. Using the independence of ’s, the authors established that the Lyapunov condition, a sufficient condition for the central limit theorem to hold for the leading term of the proposed statistic, and thus the testing statistic was shown to be asymptotically normal. Upon further inspection, we find that the variable ’s are not mutually independent and hence a dependent central limit theorem is needed to show the asymptotic normality of the proposed statistic.
Here, we present a new proof for the asymptotic normality of the leading term . Following the proof of the theorem 1 of Chen and Qin (2010), we exploit the martingale CLT (Hall and Heyde, 1980, Corollary 3.1). Before the proof, we rewrite some assumptions and notations of Ayyala et al. (2017).
2.1 Assumptions and notations
The authors assume the following:
-
follow an -dependent strictly stationary Gaussian process with mean and autocovariance structure given by . That is, for
-
and .
-
.
-
where is the spectral matrix evaluated at the zero frequency.
-
where the rate of decay is uniform for all .
The naive sample estimator of the autocovariance matrix at lag is denoted by
with . To construct an unbiased estimator of , let
Then, the expected value of is where is a coefficient matrix. Since where with and for ,
is an unbiased estimator of where . Therefore, we have
(6) |
According to Ayyala et al. (2017)
, the limiting expression of the variance of
is expressed as(7) |
2.2 Theorem 3.1 of Ayyala et al. (2017)
The main theorem of Ayyala et al. (2017) is stated as follows:
Theorem 2.1.
To show the asymptotic normality of , Ayyala et al. (2017) decompose as
(9) |
where
(10) |
and
(11) |
and show that and . To establish the asymptotic normality of , Ayyala et al. (2017) define the following quantities: an matrix with th element
where ; constants and such that , and represent where ; for
, they define the random variables
With these, Ayyala et al. (2017) further decompose as
(12) | |||||
and show that and . While convergence of can be established esasily, asymptotic normality of is not straightforward.
2.3 Revised proof
In Ayyala et al. (2017), the authors assumed that the blocks ’s are independently distributed with unequal covariances. However, they are not independent even though the blocks have zero covariance, if . The dependence structure of the blocks should be taken into account in establishing limiting distribution. We provide a new proof for asymptotic normality of which takes into account the dependence between the blocks.
Proposition 2.2.
Proof.
For , let
Then, are independent and identically distributed Gaussian variables with mean vector 0 and covariance matrix where . We note that for ,
Following the proof of Theorem 1 of Chen and Qin (2010), define
(14) |
and let for be the -algebra generated by . Then we have
To show the asymptotic normality of , we need some lemmas.
Lemma 2.3 (square integrable martingale with zero mean).
For each , is the sequence of zero mean and a square integrable martingale.
Proof of lemma 2.3.
Since ’s are i.i.d. Gaussian random variables with zero mean, has zero mean and is square integrable for any . We observe
Therefore, is a zero mean and square integrable martingale sequence. ∎
Lemma 2.4 (analogous condition on the conditional variance).
Let . Then,
Proof of lemma 2.4.
We note that
Define
, then its first moment is
The second moment of is
and thus the variance of is
By assumption (A5), it holds that . Thus, we have
This completes the proof. ∎
Lemma 2.5 (conditional Lindeberg condition).
For any ,
Proof of lemma 2.5.
We note that
It suffices to show
We have that
This completes the proof. ∎
We have thus established the sufficient conditions for martingale Central Limit Theorem. By the Corollary 3.1 of Hall and Heyde (1980), Lemmas 2.3, 2.4 and 2.5,
We define
(15) |
if for and . Note that since and
(16) | |||||
(17) |
where in (16) is due to from Lemma C.1 in the appendix of Ayyala et al. (2017) and the equality in (17) is obtained by taking . Also, by the classical central limit theorem, it holds that . Since
we have
Therefore, we get the desired result
∎
3 Two Sample Test
By construction of the blocks, we have from with satisfying for where is defined in (15). As done in one sample problem, if we define
where , then two sample test statistic is
where
(18) |
and for is defined as in one sample case.
We change the condition (14) in Ayyala et al. (2017) to the following:
(19) |
As in one sample case, we can show
Comments
There are no comments yet.