The Sparse Variance Contamination Model

07/27/2018 ∙ by Ery Arias-Castro, et al. ∙ 0

We consider a Gaussian contamination (i.e., mixture) model where the contamination manifests itself as a change in variance. We study this model in various asymptotic regimes, in parallel with the work of Ingster (1997) and Donoho and Jin (2004), who considered a similar model where the contamination was in the mean instead.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The detection of rare effects becomes important in settings where a small proportion of a population may be affected by a given treatment, for example. The situation is typically formalized as a contamination model. Although such models have a long history (e.g., in the theory of robust statistics), we adopt the perspective of Ingster [5] and Donoho and Jin [4], who consider such models in asymptotic regimes where the contamination proportion tends to zero at various rates. This line of work has mostly focused on models where the effect is a shift in mean, with some rare exceptions [2, 3]. In this paper, instead, we model the effect as a change in variance.

We consider the following contamination model:

(1)

where is the contamination proportion and

is the standard deviation of the contaminated component. (Note that this is a Gaussian mixture model with two components.) Following

[5, 4], we consider the following hypothesis testing problem: based on drawn iid from (1), decide

(2)

As usual, we study the behavior of the likelihood ratio test, which is optimal in this simple versus simple hypothesis testing problem. We also study some testing procedures that, unlike the likelihood ratio test, do not require knowledge of the model parameters :

  • The chi-squared test rejects for large values of . This is the typical variance test when the sample is known to be zero mean.

  • The extremes test rejects for combines the test that rejects for small values of and the test that rejects for large values of using Bonferroni’s method.

  • The higher criticism test [4] amounts to applying one of the tests proposed by Anderson and Darling [1] for normality. One variant is based on rejecting for large values of

    (3)

    where , where

    denotes the standard normal distribution, and

    .

The testing problem (2) was partially addressed by Cai, Jeng, and Jin [3]

, who consider a contamination model where the effect manifests itself as a shift in mean and a change in variance. However, in their setting the variance is fixed, while we let the variance change with the sample size in an asymptotic analysis that is now standard in this literature.

Our analysis reveals three distinct situations:

  1. Near zero (): In the sparse regime, the higher criticism test is as optimal as the likelihood ratio test, while the chi-squared test is powerless and the extremes test is suboptimal.

  2. Near one (): In the dense regime, the chi-squared test and the higher criticism test are as optimal as the likelihood ratio test, while the extremes test has no power.

  3. Away from zero and one: In the sparse regime, the extremes test and the higher criticism test are as optimal as the likelihood ratio test, while the chi-squared test is asymptotically powerless if is bounded.

In the tradition of Ingster [5], we set

(4)

The setting where is often called the dense regime while the setting where is often called the sparse regime. (Note that the setting where

is uninteresting since in that case there is no contamination with probability tending to 1.)

2 The likelihood ratio test

We start with bounding the performance of the likelihood ratio test. As this is the most powerful test by the Neyman-Pearson Lemma, this bound also applies to any other test. We say that a testing procedure is asymptotically powerless if the sum of its probabilities of Type I and Type II errors (its risk) has limit inferior at least 1 in the large sample asymptote.

2.1 Near zero

Consider the testing problem (2) in the regime where as . More specifically, we adopt the following parameterization as it brings into focus the first-order asymptotics:

(5)
Theorem 1.

For the testing problem (2) with parameterization (4) and (5), the likelihood ratio test (and then any other test procedure) is asymptotically powerless when

(6)
Proof.

The likelihood ratio is

(7)

where is the likelihood ratio for observation , which in this case is

(8)
(9)

The risk of the likelihood ratio test is equal to

(10)

Our goal is to show that under the stated conditions. When is below and bounded away from

, it turns out that a crude method, the so-called 2nd moment method which relies on the Cauchy-Schwarz Inequality, is enough to lower bound the risk. Indeed, by the Cauchy-Schwarz Inequality,

(11)

and we are left with the task of finding conditions under which .

We have

(12)

where

(13)
(14)
(15)
(16)

Therefore,

(17)

so that when

(18)

Plugging in the parameterization (4) and (5), we immediately see that this condition is fulfilled when (6) holds, and this concludes the proof. ∎

2.2 Near one

Consider the testing problem (2) in the regime where . More specifically, we adopt the following parameterization:

(19)
Theorem 2.

For the testing problem (2) with parameterization (4) and (19), the likelihood ratio test (and then any other test procedure) is asymptotically powerless when

(20)
Proof.

Restarting the proof of Theorem 1 at (18), and plugging in the parameterization (4) and (19), we immediately see that when (20) holds. ∎

2.3 Away from zero and one

Consider the testing problem (2) in the regime where is fixed away from 0 and 1. (Some of the results developed in this section are special cases of results in [3].)

Theorem 3.

For the testing problem (2) with parameterization (4) and is fixed, the likelihood ratio test (and therefore any other test) is asymptotically powerless when and

(21)
Proof.

We use a refinement of the second moment method, sometimes called the truncated second moment method, which is based on bounding the moments of a thresholded version of the likelihood ratio. Define the indicator variable and the corresponding truncated likelihood ratio

(22)

Using the triangle inequality, the fact that , and the Cauchy-Schwarz Inequality, we have the following upper bound:

(23)
(24)

so that when and .

For the first moment, we have

(25)

so that it suffices to prove that . We develop

(26)
(27)
(28)
(29)

where is the standard normal survival function. We used the well-known fact that as . Since with , and (21) holds, we have , so that .

For the second moment, we have

(30)

so that it suffices to prove that . We develop

(31)
(32)
(33)
(34)

Hence, it suffices that , which is equivalent to (21). ∎

3 Other tests

Having studied the performance of the likelihood ratio test, we now turn to studying the performance of the chi-squared test, the extremes test, and the higher criticism test. These tests are more practical in that they do not require knowledge of the parameters driving the alternative, , to be implemented.

3.1 The chi-squared test

The chi-squared test is the classical variance test. It happens to only be asymptotically powerful in the dense regime when is bounded away from 1.

Proposition 1.

For the testing problem (2) with parameterization (4), the chi-squared test is asymptotically powerful when and either is bounded away from 1 or (19) holds with . The chi-squared test is asymptotically powerless when and is bounded.

Proof.

We divide the proof into the two regimes.

Dense regime (). We show that there is a chi-squared test that is asymptotically powerful when . To do so, we use Chebyshev’s inequality. Under ,

has the chi-squared distribution with

degrees of freedom. But using only the fact that and , by Chebyshev’s inequality, we have

(35)

for any sequence diverging to infinity. Under , and . Note that eventually. By Chebyshev’s inequality,

(36)

We choose and consider the test with rejection region . This test is asymptotically powerful when, eventually,

(37)

meaning,

(38)

This is the case when with no condition on other than remaining bounded away from 1, and also when (19) holds and .

Sparse regime (). To prove that the chi-squared procedure is asymptotically powerless when , we argue in terms of convergence in distribution rather than the simple bounding of moments. Under

, the usual Central Limit Theorem implies that

converges weakly to the standard normal distribution. Under , the same is true using the Lyapunov Central Limit Theorem for triangular arrays. Indeed, even though the distribution of depends on , uniformly

(39)

so that converges weakly to the standard normal distribution. Since

(40)

with

(41)

and

(42)

it is also the case that converges weakly to the standard normal distribution. Hence, there is no test based on that has any asymptotic power. ∎

3.2 The extremes test

The extremes test, as the name indicates, focuses on the extreme observations, disregarding the rest of the sample. It happens to be suboptimal in the setting where , while it achieves the detection boundary in the sparse regime in the setting where is fixed.

Proposition 2.

For the testing problem (2) with parameterization (4) and (5), the extremes test is asymptotically powerful when (and asymptotically powerless when ). If instead is fixed, the extremes test is asymptotically powerful when (and asymptotically powerless when ).

Proof.

Under , for any , we have

(43)
(44)
(45)

Similarly, as is well-known,

(46)

We thus consider the test with rejection region .

We now consider the alternative. We first consider the case where (5) holds. We focus on the main sub-case where, in addition, . Let index the contaminated observations, meaning those sampled from . In our mixture model, is binomial with parameters . Let be iid standard normal variables and set . We have

(47)
(48)
(49)
(50)

Since we have assumed that in (5), we have , and therefore

(51)

This in turn implies that

(52)

when , which is the case when .

Assume instead that . Fix a level and consider the extremes test at that level. Based on the same calculations, this test has rejection region , where and are defined by and , respectively. Note that

(53)

For the minimum, we have

(54)

Let be iid standard normal variables. Clearly,

(55)

and, as was derived above,

(56)
(57)

with

(58)

since . Thus, . And since under the alternative is stochastically bounded from above by its distribution under the null (since ), we also have . Hence, the extremes test (at level arbitrary) has asymptotic power , meaning it is asymptotically powerless. (It is no better than random guessing.)

Next, we consider the case where is fixed. Following similar arguments, now with , we have

(59)
(60)
(61)
(62)

We have

(63)

so that

(64)

when , which is the case when .

Using a similar line of arguments, it can also be shown that the test is asymptotically powerless when is fixed. ∎

3.3 The higher criticism test

The higher criticism, which looks at the entire sample via excursions of its empirical process, happens to achieve the detection boundary in all regimes, and is thus (first-order) comparable to the likelihood ratio test while being adaptive to the model parameters.

Proposition 3.

For the testing problem (2) with parameterization (4), the higher criticism test is asymptotically powerful when either (5) holds with , or (19) holds with , or is fixed, or and is fixed.

Proof.

Let denote the higher criticism statistic (3). Jaeschke [6] derived the asymptotic distribution of under the null, and this weak convergence result in particular implies that

(65)

For simplicity, because it is enough for our purposes, we consider the test with rejection region . Note that the test is asymptotically powerful if, under the alternative, there is such that

(66)

with probability tending to 1. To establish this, we will apply Chebyshev’s inequality. Indeed, is binomial with parameters and , so that

(67)

with probability tending to 1. When this is the case, we have

(68)

where

(69)

and only need to prove that

(70)

First, assume that (5) holds with . We focus on the interesting sub-case where . Fix such that and and set . Then, using the fact that , we have

(71)

so that

(72)

and therefore (70) is fulfilled, eventually.

Next, we assume that (19) holds with . Here we set , and get , and

(73)

so that

(74)

and therefore (70) is fulfilled, eventually.

The same arguments apply to the case where and is fixed. (It essentially corresponds to the previous case with .)

The remaining case is where is fixed, with (for otherwise it is included in the previous case). We choose , with , and get

(75)

and

(76)

so that

(77)

and therefore (70) is fulfilled, eventually. ∎

4 Some numerical experiments

We performed some numerical experiments to investigate the finite sample performance of the tests considered here: the likelihood ratio test, the chi-squared test, the extremes test, the higher criticism test. The sample size was set large to in order to capture the large-sample behavior of these tests. We tried four scenarios with different combinations of . The p-values for each test are calibrated as follows:

  1. For the likelihood ratio test and the higher criticism test, we simulated the null distribution based on Monte Carlo replicates.

  2. For the extremes test and the chi-squared test, we used the exact null distribution, which in each case is available in closed form.

For each combination of , we repeated the whole process 200 times and recorded the fraction of p-values smaller than 0.05, representing the empirical power at the 0.05 level. The result of this experiment is reported in Figure 1 and is largely congruent with the theory developed earlier in the paper.

Figure 1: Empirical power comparison with 95% error bars. A. Sparse regime where and . B. Dense regime where and fixed. Note that the LR test is here asymptotically powerful at any . C. Dense regime where and . D. Sparse regime where and . Each time, the horizontal line marks the level (set at 0.05) and the vertical line marks the asymptotic detection boundary derived earlier in the paper.

References

  • Anderson and Darling [1952] T. W. Anderson and D. A. Darling. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. The Annals of Mathematical Statistics, 23(2):193–212, 1952.
  • Cai and Wu [2014] T. T. Cai and Y. Wu. Optimal detection of sparse mixtures against a given null distribution. IEEE Transactions on Information Theory, 60(4):2217–2232, 2014.
  • Cai et al. [2011] T. T. Cai, X. J. Jeng, and J. Jin.

    Optimal detection of heterogeneous and heteroscedastic mixtures.

    Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(5):629–662, 2011.
  • Donoho and Jin [2004] D. Donoho and J. Jin. Higher criticism for detecting sparse heterogeneous mixtures. The Annals of Statistics, 32(3):962–994, 2004.
  • Ingster [1997] Y. I. Ingster. Some problems of hypothesis testing leading to infinitely divisible distributions. Mathematical Methods of Statistics, 6(1):47–69, 1997.
  • Jaeschke [1979] D. Jaeschke. The asymptotic distribution of the supremum of the standardized empirical distribution function on subintervals. The Annals of Statistics, 7(1):108–115, 1979.