Note on Mean Vector Testing for High-Dimensional Dependent Observations

04/19/2019 ∙ by Seonghun Cho, et al. ∙ Augusta University 0

For the mean vector test in high dimension, Ayyala et al.(2017,153:136-155) proposed new test statistics when the observational vectors are M dependent. Under certain conditions, the test statistics for one-same and two-sample cases were shown to be asymptotically normal. While the test statistics and the asymptotic results are valid, some parts of the proof of asymptotic normality need to be corrected. In this work, we provide corrections to the proofs of their main theorems. We also note a few minor discrepancies in calculations in the publication.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Mean vector testing in high dimension is gaining great attention in the recent past with increasing availability of data sets where the number of variables is greater than the sample size. The traditional Hoteling’s test and the more recently developed test statistics assume that the observations are independently and identically distributed. Comparison of mean vector for distributions where the samples are dependent is a relatively understudied problem. The one-sample mean vector test under dependence is defined as follows. The observations are assumed to be -dimensional random vectors with mean and a covariance matrix that may not be diagonal. The problem of interest is then to test

(1)

In the two-sample case, and are two independent groups of -dimensional observations with mean vectors , and covariance matrices , respectively. The hypothesis of interest is that the two population means are equal, viz.

(2)

In (Ayyala et al., 2017), we developed a hypothesis test for (1) when the samples follow an -dependent stationary Gaussian process with mean and autocovariance structure given by . The rate of increase with respect to was assumed to be linear for () and sublinear for (

). These assumptions ensure that the number of variables is not increasing faster than the sample size and that there are sufficiently many observations for estimating the autocovariance at all lags. The dependence structure is made sparse by assuming

for any where and the matrix is the covariance of multiplied by the sample size, i.e., . Summarizing the model, the assumptions are

(3)

The test statistic is based on the Euclidean norm of the sample average, . Define where

is an unbiased estimator for

. Under , this quantity has expected value . Using this property, the test statistic is constructed as

(4)

where is a ratio-consistent estimator for , i.e., . For expressions of and , refer to Ayyala et al. (2017)

. Under the null hypothesis, the test statistic is shown to be asymptotically normal. When we define

if , the power function at significance level under a local alternative is derived as

(5)

as where

is the cumulative distribution of standard normal distribution.

The mean in the local alternative satisfies the condition for all .

In the proof of asymptotic normality of , we used a two-dimensional array argument. The main idea was to divide the array into blocks which were at least

indices apart. The proof was established by showing that (a) these blocks dominate the remainder of terms and (b) the blocks are independent and hence central limit theorem can be used by verifying Lyapunov conditions. However upon further inspection, we see the blocks are not independent, albeit being uncorrelated. Also, we have identified convergence issues for asymptotic power under the rate of increase of

assumed in (3). The main result and numerical studies of Ayyala et al. (2017) remain valid and provide the justification of the use of the test statistic. However, some of the theoretical details of the paper need to be revised.

In this article, we address these issues and provide corrected proofs for the results in Ayyala et al. (2017) as well as the further discussion on the power of the test. The remainder of the paper is organized as follows. In Section 2, a corrected proof for asymptotic normality for the one-sample test statistic is provided. Using the corrected proof, the result for the two-sample case is verified in Section 3. Asymptotic power of the test statistic for the one sample case is presented in Section 4. The notation used in the remainder of the article is made to be consistent with Ayyala et al. (2017). For definitions and additional details on the variables defined, kindly refer to Ayyala et al. (2017).

2 Proof of theorem 3.1 of Ayyala et al. (2017)

In the proof of theorem 3.1 of Ayyala et al. (2017), the authors claim that the variable defined for all (formal definition of is given in Section 2.2) are mutually independent since the process is Gaussian with -dependent stationarity. Using the independence of ’s, the authors established that the Lyapunov condition, a sufficient condition for the central limit theorem to hold for the leading term of the proposed statistic, and thus the testing statistic was shown to be asymptotically normal. Upon further inspection, we find that the variable ’s are not mutually independent and hence a dependent central limit theorem is needed to show the asymptotic normality of the proposed statistic.

Here, we present a new proof for the asymptotic normality of the leading term . Following the proof of the theorem 1 of Chen and Qin (2010), we exploit the martingale CLT (Hall and Heyde, 1980, Corollary 3.1). Before the proof, we rewrite some assumptions and notations of Ayyala et al. (2017).

2.1 Assumptions and notations

The authors assume the following:

  1. follow an -dependent strictly stationary Gaussian process with mean and autocovariance structure given by . That is, for

  2. and .

  3. .

  4. where is the spectral matrix evaluated at the zero frequency.

  5. where the rate of decay is uniform for all .

The naive sample estimator of the autocovariance matrix at lag is denoted by

with . To construct an unbiased estimator of , let

Then, the expected value of is where is a coefficient matrix. Since where with and for ,

is an unbiased estimator of where . Therefore, we have

(6)

According to Ayyala et al. (2017)

, the limiting expression of the variance of

is expressed as

(7)

2.2 Theorem 3.1 of Ayyala et al. (2017)

The main theorem of Ayyala et al. (2017) is stated as follows:

Theorem 2.1.

(Ayyala et al., 2017, theorem 3.1) Suppose that the assumptions 15 and the null hypothesis hold. Then, as ,

(8)

To show the asymptotic normality of , Ayyala et al. (2017) decompose as

(9)

where

(10)

and

(11)

and show that and . To establish the asymptotic normality of , Ayyala et al. (2017) define the following quantities: an matrix with th element

where ; constants and such that , and represent where ; for

, they define the random variables

With these, Ayyala et al. (2017) further decompose as

(12)

and show that and . While convergence of can be established esasily, asymptotic normality of is not straightforward.

2.3 Revised proof

In Ayyala et al. (2017), the authors assumed that the blocks ’s are independently distributed with unequal covariances. However, they are not independent even though the blocks have zero covariance, if . The dependence structure of the blocks should be taken into account in establishing limiting distribution. We provide a new proof for asymptotic normality of which takes into account the dependence between the blocks.

Proposition 2.2.

Under the assumption of Theorem 2.1, it holds that

(13)

as .

Proof.

For , let

Then, are independent and identically distributed Gaussian variables with mean vector 0 and covariance matrix where . We note that for ,

Following the proof of Theorem 1 of Chen and Qin (2010), define

(14)

and let for be the -algebra generated by . Then we have

To show the asymptotic normality of , we need some lemmas.

Lemma 2.3 (square integrable martingale with zero mean).

For each , is the sequence of zero mean and a square integrable martingale.

Proof of lemma 2.3.

Since ’s are i.i.d. Gaussian random variables with zero mean, has zero mean and is square integrable for any . We observe

Therefore, is a zero mean and square integrable martingale sequence. ∎

Lemma 2.4 (analogous condition on the conditional variance).

Let . Then,

Proof of lemma 2.4.

We note that

Define

, then its first moment is

The second moment of is

and thus the variance of is

By assumption (A5), it holds that . Thus, we have

This completes the proof. ∎

Lemma 2.5 (conditional Lindeberg condition).

For any ,

Proof of lemma 2.5.

We note that

It suffices to show

We have that

This completes the proof. ∎

We have thus established the sufficient conditions for martingale Central Limit Theorem. By the Corollary 3.1 of Hall and Heyde (1980), Lemmas 2.3, 2.4 and 2.5,

We define

(15)

if for and . Note that since and

(16)
(17)

where in (16) is due to from Lemma C.1 in the appendix of Ayyala et al. (2017) and the equality in (17) is obtained by taking . Also, by the classical central limit theorem, it holds that . Since

we have

Therefore, we get the desired result

3 Two Sample Test

By construction of the blocks, we have from with satisfying for where is defined in (15). As done in one sample problem, if we define

where , then two sample test statistic is

where

(18)

and for is defined as in one sample case.

We change the condition (14) in Ayyala et al. (2017) to the following:

(19)

As in one sample case, we can show