# Note on Mean Vector Testing for High-Dimensional Dependent Observations

For the mean vector test in high dimension, Ayyala et al.(2017,153:136-155) proposed new test statistics when the observational vectors are M dependent. Under certain conditions, the test statistics for one-same and two-sample cases were shown to be asymptotically normal. While the test statistics and the asymptotic results are valid, some parts of the proof of asymptotic normality need to be corrected. In this work, we provide corrections to the proofs of their main theorems. We also note a few minor discrepancies in calculations in the publication.

## Authors

• 2 publications
• 7 publications
• 4 publications
• 6 publications
• 2 publications
• ### A Pairwise Hotelling Method for Testing High-Dimensional Mean Vectors

For high-dimensional small sample size data, Hotelling's T2 test is not ...
03/10/2020 ∙ by Zongliang Hu, et al. ∙ 0

• ### High-dimensional CLT for Sums of Non-degenerate Random Vectors: n^-1/2-rate

In this note, we provide a Berry–Esseen bounds for rectangles in high-di...
09/28/2020 ∙ by Arun Kumar Kuchibhotla, et al. ∙ 0

• ### Asymptotically optimal test for dependent multiple testing set up

In this paper we explore the behaviour of dependent test statistics for ...
01/07/2020 ∙ by Rahul Roy, et al. ∙ 0

• ### Bootstrapping ℓ_p-Statistics in High Dimensions

This paper considers a new bootstrap procedure to estimate the distribut...
06/23/2020 ∙ by Alexander Giessing, et al. ∙ 0

• ### Corrected Kriging update formulae for batch-sequential data assimilation

Recently, a lot of effort has been paid to the efficient computation of ...
03/29/2012 ∙ by Clément Chevalier, et al. ∙ 0

• ### A Smeary Central Limit Theorem for Manifolds with Application to High Dimensional Spheres

The (CLT) central limit theorems for generalized Frechet means (data des...
01/19/2018 ∙ by Benjamin Eltzner, et al. ∙ 0

• ### Wald Statistics in high-dimensional PCA

In this note we consider PCA for Gaussian observations X_1,..., X_n with...
05/10/2018 ∙ by Matthias Löffler, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Mean vector testing in high dimension is gaining great attention in the recent past with increasing availability of data sets where the number of variables is greater than the sample size. The traditional Hoteling’s test and the more recently developed test statistics assume that the observations are independently and identically distributed. Comparison of mean vector for distributions where the samples are dependent is a relatively understudied problem. The one-sample mean vector test under dependence is defined as follows. The observations are assumed to be -dimensional random vectors with mean and a covariance matrix that may not be diagonal. The problem of interest is then to test

 (1)

In the two-sample case, and are two independent groups of -dimensional observations with mean vectors , and covariance matrices , respectively. The hypothesis of interest is that the two population means are equal, viz.

 (2)

In (Ayyala et al., 2017), we developed a hypothesis test for (1) when the samples follow an -dependent stationary Gaussian process with mean and autocovariance structure given by . The rate of increase with respect to was assumed to be linear for () and sublinear for (

). These assumptions ensure that the number of variables is not increasing faster than the sample size and that there are sufficiently many observations for estimating the autocovariance at all lags. The dependence structure is made sparse by assuming

for any where and the matrix is the covariance of multiplied by the sample size, i.e., . Summarizing the model, the assumptions are

 p=O(n),M=O(n1/8),tr(Γ(a)Γ(b)Γ(c)Γ(d))=o{tr2(Ω2n)}∀a,b,c,d∈M. (3)

The test statistic is based on the Euclidean norm of the sample average, . Define where

is an unbiased estimator for

. Under , this quantity has expected value . Using this property, the test statistic is constructed as

 Tn=Mn√ˆvar(Mn), (4)

where is a ratio-consistent estimator for , i.e., . For expressions of and , refer to Ayyala et al. (2017)

. Under the null hypothesis, the test statistic is shown to be asymptotically normal. When we define

if , the power function at significance level under a local alternative is derived as

 βn(μ)≃Φ⎛⎜ ⎜⎝−zα+nμ⊤μ√2tr(Ω2n)⎞⎟ ⎟⎠ (5)

as where

is the cumulative distribution of standard normal distribution.

The mean in the local alternative satisfies the condition for all .

In the proof of asymptotic normality of , we used a two-dimensional array argument. The main idea was to divide the array into blocks which were at least

indices apart. The proof was established by showing that (a) these blocks dominate the remainder of terms and (b) the blocks are independent and hence central limit theorem can be used by verifying Lyapunov conditions. However upon further inspection, we see the blocks are not independent, albeit being uncorrelated. Also, we have identified convergence issues for asymptotic power under the rate of increase of

assumed in (3). The main result and numerical studies of Ayyala et al. (2017) remain valid and provide the justification of the use of the test statistic. However, some of the theoretical details of the paper need to be revised.

In this article, we address these issues and provide corrected proofs for the results in Ayyala et al. (2017) as well as the further discussion on the power of the test. The remainder of the paper is organized as follows. In Section 2, a corrected proof for asymptotic normality for the one-sample test statistic is provided. Using the corrected proof, the result for the two-sample case is verified in Section 3. Asymptotic power of the test statistic for the one sample case is presented in Section 4. The notation used in the remainder of the article is made to be consistent with Ayyala et al. (2017). For definitions and additional details on the variables defined, kindly refer to Ayyala et al. (2017).

## 2 Proof of theorem 3.1 of Ayyala et al. (2017)

In the proof of theorem 3.1 of Ayyala et al. (2017), the authors claim that the variable defined for all (formal definition of is given in Section 2.2) are mutually independent since the process is Gaussian with -dependent stationarity. Using the independence of ’s, the authors established that the Lyapunov condition, a sufficient condition for the central limit theorem to hold for the leading term of the proposed statistic, and thus the testing statistic was shown to be asymptotically normal. Upon further inspection, we find that the variable ’s are not mutually independent and hence a dependent central limit theorem is needed to show the asymptotic normality of the proposed statistic.

Here, we present a new proof for the asymptotic normality of the leading term . Following the proof of the theorem 1 of Chen and Qin (2010), we exploit the martingale CLT (Hall and Heyde, 1980, Corollary 3.1). Before the proof, we rewrite some assumptions and notations of Ayyala et al. (2017).

### 2.1 Assumptions and notations

The authors assume the following:

1. follow an -dependent strictly stationary Gaussian process with mean and autocovariance structure given by . That is, for

 Cov(Xt,Xt+h)=Γ(h).
2. and .

3. .

4. where is the spectral matrix evaluated at the zero frequency.

5. where the rate of decay is uniform for all .

The naive sample estimator of the autocovariance matrix at lag is denoted by

 ˆΓ(h)=1nn−h∑t=1(Xt−¯Xn)(Xt+h−¯Xn)T,

with . To construct an unbiased estimator of , let

 γ=⎛⎜ ⎜⎝tr(Γ(0))⋮tr(Γ(M))⎞⎟ ⎟⎠∈RM+1,andˆγn=⎛⎜ ⎜ ⎜⎝tr(ˆΓ(0))⋮tr(ˆΓ(M))⎞⎟ ⎟ ⎟⎠∈RM+1.

Then, the expected value of is where is a coefficient matrix. Since where with and for ,

 ˆtr(Ωn)=βTnˆγn

is an unbiased estimator of where . Therefore, we have

 E(Mn)=E(¯XTn¯Xn−1nˆtr(Ωn))=μTμ. (6)

According to Ayyala et al. (2017)

, the limiting expression of the variance of

is expressed as

 var(Mn)=2n2tr(Ω2n)+o(1n2tr(Ω2n)). (7)

### 2.2 Theorem 3.1 of Ayyala et al. (2017)

The main theorem of Ayyala et al. (2017) is stated as follows:

###### Theorem 2.1.

(Ayyala et al., 2017, theorem 3.1) Suppose that the assumptions 15 and the null hypothesis hold. Then, as ,

 Tn=Mn√var(Mn)d→N(0,1). (8)

To show the asymptotic normality of , Ayyala et al. (2017) decompose as

 (9)

where

 Δ1=¯XTn¯Xn−1ntr(Ωn)√var(Mn) (10)

and

 Δ2=ˆtr(Ωn)−tr(Ωn)n√var(Mn), (11)

and show that and . To establish the asymptotic normality of , Ayyala et al. (2017) define the following quantities: an matrix with th element

 Aij=1n2[XTiXj−tr(Γ(i−j))],

where ; constants and such that , and represent where ; for

, they define the random variables

 Bij = iwn−M∑k=(i−1)wn+1jwn−M∑l=(j−1)wn+1Akl, Dij = iwn∑k=(i−1)wn+1jwn∑l=(j−1)wn+1Akl−Bij, F = ∑(k,l)∈{1,…,n}2−{1,…,wnkn}2Akl.

With these, Ayyala et al. (2017) further decompose as

 Δ1 = ¯XTn¯Xn−1ntr(Ωn)√var(Mn)=∑ni=1∑nj=1Aij√var(Mn) (12) = ∑kni=1∑knj=1Bij√var(Mn)+∑kni=1∑knj=1Dij+F√var(Mn)=Δ11+Δ12,

and show that and . While convergence of can be established esasily, asymptotic normality of is not straightforward.

### 2.3 Revised proof

In Ayyala et al. (2017), the authors assumed that the blocks ’s are independently distributed with unequal covariances. However, they are not independent even though the blocks have zero covariance, if . The dependence structure of the blocks should be taken into account in establishing limiting distribution. We provide a new proof for asymptotic normality of which takes into account the dependence between the blocks.

###### Proposition 2.2.

Under the assumption of Theorem 2.1, it holds that

 Δ11=∑kni=1∑knj=1Bij√Var(Mn)d→N(0,1) (13)

as .

###### Proof.

For , let

 Yi=1wn−Miwn−M∑k=(i−1)wn+1Xk.

Then, are independent and identically distributed Gaussian variables with mean vector 0 and covariance matrix where . We note that for ,

 Bij=iwn−M∑k=(i−1)wn+1jwn−M∑l=(j−1)wn+1XTkXl=(wn−M)2YTiYj.

Following the proof of Theorem 1 of Chen and Qin (2010), define

 ϕnij = (wn−M)2n2YTiYjfor i

and let for be the -algebra generated by . Then we have

 ∑i≠jBij=2Snkn.

To show the asymptotic normality of , we need some lemmas.

###### Lemma 2.3 (square integrable martingale with zero mean).

For each , is the sequence of zero mean and a square integrable martingale.

###### Proof of lemma 2.3.

Since ’s are i.i.d. Gaussian random variables with zero mean, has zero mean and is square integrable for any . We observe

 E(Sn(m+1)|Fnm) = m+1∑j=2j−1∑i=1(wn−M)2n2E(YTiYj|Fnm) = m∑j=2j−1∑i=1(wn−M)2n2YTiYj+m∑i=1(wn−M)2n2YTiE(Ym+1|Fnm) = Snm.

Therefore, is a zero mean and square integrable martingale sequence. ∎

###### Lemma 2.4 (analogous condition on the conditional variance).

Let . Then,

 1σ2nkn∑j=2E[V2nj|Fn(j−1)]p→14.
###### Proof of lemma 2.4.

We note that

 E[V2nj|Fn(j−1)] = (wn−M)4n4E⎡⎢⎣(j−1∑i=1YTiYj)2∣∣ ∣∣Fn(j−1)⎤⎥⎦ = (wn−M)4n4j−1∑i1,i2=1YTi1E[YjYTj|Fn(j−1)]Yi2 = (wn−M)3n4j−1∑i1,i2=1YTi1ΩwnYi2.

Define

, then its first moment is

 E(ηn) = (wn−M)3n4kn∑j=2j−1∑i1,i2=1E(YTi1ΩwnYi2) = (wn−M)2n4kn∑j=2(j−1)tr(Ω2wn) = kn(kn−1)(wn−M)22n4tr(Ω2wn)=14σ2n.

The second moment of is

 E(η2n) = (wn−M)6n8kn∑j1,j2=2j1−1∑i1,i2=1j2−1∑i3,i4=1E(YTi1ΩwnYi2YTi3ΩwnYi4) = (wn−M)6n8kn∑j1,j2=2(j1−1)2(j2−1)2E(¯YTj1Ωwn¯Yj1¯YTj2Ωwn¯Yj2) = (wn−M)4n8kn∑j1,j2=2(j1−1)2(j2−1)2[2(j1∧j2−1)2(j1−1)2(j2−1)2tr(Ω4wn)+1(j1−1)(j2−1)tr2(Ω2wn)] = (wn−M)4n8[k2n(kn−1)(kn+1)6tr(Ω4wn)+k2n(kn−1)24tr2(Ω2wn)],

and thus the variance of is

 Var(ηn)=k2n(kn−1)(kn+1)(wn−M)46n8tr(Ω4wn).

By assumption (A5), it holds that . Thus, we have

 Var(ηn)σ4n=kn+124(kn−1)tr(Ω4wn)tr2(Ω2wn)=o(1).

This completes the proof. ∎

###### Lemma 2.5 (conditional Lindeberg condition).

For any ,

 1σ2nkn∑j=2E[V2njI(|Vnj|>ϵσn)|Fn(j−1)]p→0.
###### Proof of lemma 2.5.

We note that

 1σ2nkn∑j=2E[V2njI(|Vnj|>ϵσn)|Fn(j−1)]≤1ϵ2σ4nkn∑j=2E[V4nj|Fn(j−1)].

It suffices to show

 E[kn∑j=2E[V4nj|Fn(j−1)]]=o(σ4n).

We have that

 E[kn∑j=2E[V4nj|Fn(j−1)]] = kn∑j=2E(V4nj)=kn∑j=2E((wn−M)2n2j−1∑i=1YTiYj)4 = (wn−M)8n8kn∑j=2[6(j−1)2(wn−M)4tr(Ω4wn)+3(j−1)2(wn−M)4tr2(Ω2wn)] = kn(kn−1)(2kn−1)(wn−M)42n8(2tr(Ω4wn)+tr2(Ω2wn)) = o(σ4n).

This completes the proof. ∎

We have thus established the sufficient conditions for martingale Central Limit Theorem. By the Corollary 3.1 of Hall and Heyde (1980), Lemmas 2.3, 2.4 and 2.5,

 Δ111=σn√Var(Mn)∑i≠jBijσnd→N(0,1).

We define

 an≍bn (15)

if for and . Note that since and

 |tr(Ω2wn)−tr(Ω2n)|tr(Ω2n) = |∑|i|,|j|≤M(−2|i|wn−M+2|j|n+|i||j|(wn−M)2−|i||j|n2)tr(Γ(i)Γ(j))|tr(Ω2n) (16) ≤ 44(M+1)3wn(1+o(1))√n1(M+1)2o(tr(Ω2n))tr(Ω2n) = o(1) (17)

where in (16) is due to from Lemma C.1 in the appendix of Ayyala et al. (2017) and the equality in (17) is obtained by taking . Also, by the classical central limit theorem, it holds that . Since

 Var(B11)=2(wn−M)2n4tr(Ω2wn),

we have

 Δ112=1√kn√k2nVar(B11)Var(Mn)∑kni=1Bii√knVar(B11)p→0.

Therefore, we get the desired result

 Δ11=Δ111+Δ112d→N(0,1).

## 3 Two Sample Test

By construction of the blocks, we have from with satisfying for where is defined in (15). As done in one sample problem, if we define

 Bgij=iwng−M∑k=(i−1)wng+1jwng−M∑l=(j−1)wng+1X⊤gkXgl=(wng−M)2Y⊤giYgj,

where , then two sample test statistic is

 Mn= (¯X1−¯X2)′(¯X1−¯X2)−1n1tr(ˆΩ(1)n1)−1n2tr(ˆΩ(2)n2) = (wn1−M)2n21∑i≠jY⊤1iY1j+(wn2−M)2n22∑i≠jY⊤2iY2j−2(wn1−M)(wn2−M)n1n2∑1≤i≤kn1∑1≤j≤kn2Y⊤1iY2j +Rn = ⎧⎨⎩1kn1(kn1−1)∑i≠jY⊤1iY1j+1kn2(kn2−1)∑i≠jY⊤2iY2j−21kn1kn2∑1≤i≤kn1∑1≤j≤kn2Y⊤1iY2j⎫⎬⎭Tn(1+op(1)) +Rn = Tn(1+oo(1))+Rn

where

 Rn=kn1∑i=1B1ii+kn2∑i=1B2ii+Δ(1)12+Δ(2)12 (18)

and for is defined as in one sample case.

We change the condition (14) in Ayyala et al. (2017) to the following:

 tr(Γ(w1)(a)Γ(w2)(b)Γ(w3)(c)Γw4(d))=o((M+1)−4tr2((Ω(1)n1+Ω(2)n2)2)) (19)

As in one sample case, we can show

 Rn√var(