The detection of rare effects has been an important problem for years in settings, and may be particularly relevant today, for example, with the search for personalized care in the health industry, where a small fraction of a population may respond particularly well, or particularly poorly, to some given treatment .
Following a theoretical investigation initiated in large part by Ingster  and broadened by Donoho and Jin , we are interested in studying two-component mixture models, also known as contamination models, in various asymptotic regimes defined by how the small mixture weight converges to zero. Most of the existing work in the setting of univariate data has focused on models where the contamination manifests itself as a shift in mean [10, 9, 12, 6, 17] with a few exceptions where the effect is a change in variance , or a change in both mean and variance .
In the present paper, we are interested in bivariate data, instead, and more specifically in a situation where the effect felt in the dependence between the two variables being measured. This setting has been recently considered in the literature in the context of assessing the reproducibility of studies. For example, Li et al. 
aimed to identify significant features from separate studies using an expectation-maximization (EM) algorithm. They applied a copula mixture model and assumed that changes in the mean and covariance matrix differentiate the contaminated component from the null component.Zhao et al.  studied another model where variables from the contamination are stochastically larger marginally. In both models, the marginal distributions have some non-null effects. Similar settings have been considered within a multiple testing framework [5, 20].
While existing work has focused on models motivated by questions of reproducibility, in the present work we come back to basics and directly address the problem of detecting a bivariate mixture with a component where the variables are independent and a component where the variables are positively dependent.
1.1 Gaussian Mixture Model
where is the contamination proportion and is the correlation between the two variables under contamination. We consider the following hypothesis testing problem: based on drawn iid from (1), decide
Note that under the null hypothesis,is from the bivariate standard normal. Under the alternative, and remain standard normal marginally. Following the literature on the detection of sparse mixtures [14, 8], we are most interested in a situation, asymptotic as , where , and the central question is how large needs to be in order to reliability distinguish these hypotheses.
The formulation (1) suggests that the alternative hypothesis is composite, but if we assume that are known under the alternative, then the likelihood ratio test (LRT) is optimal by Neyman-Pearson lemma. We start with characterizing the behavior of the LRT, which provides a benchmark. We then study some other testing procedures that do not require knowledge of the model parameters:111 Such procedures are said to be adaptive.
The covariance test rejects for large values of , and coincides with Rao’s score test in the present context. This is the classical test for independence, specifically designed for the case where and under the alternative. We shall see that it is suboptimal in some regimes.
The extremes test rejects for small values of . This test exploits the fact that, because is assumed positive, the variables in the contaminated component are closer to each other than in the null component.
The higher criticism test was suggested by John Tukey and deployed by Donoho and Jin  for the testing of sparse mixtures. We propose a version of that test based on the pairwise differences, . In detail, the test rejects for large values of
where , with
denotes the standard normal distribution function, and, the empirical distribution function of .
The setting where is often called the dense regime and the setting where is often called the sparse regime. Our analysis reveals the following:
Dense regime. The dense regime is most interesting when . In that case, we find that the covariance test and the higher criticism test match the asymptotic performance of the likelihood ratio test to first-order, while the extremes test has no power.
Sparse regime. The sparse regime is most interesting when . In that case, we find that the higher criticism test still performs as well as the likelihood ratio test to first order, while the covariance test is powerless, and the extremes test is suboptimal.
1.2 Gaussian Mixture Copula Model
From a practical point of view, the assumption that both and are normally distributed is quite stringent. Hence, we would like to know if there are nonparametric procedures that do not require such a condition but can still achieve the same performance as the likelihood ratio test. In the univariate setting where the effect arises as a shift in mean, this was investigated in . In the bivariate setting, in a model for reproducibility, Zhao et al.  proposed a nonparametric test based on a weighted version of Hoeffding’s test for independence.
, meaning that there is a latent random vectorsuch that
where and are unknown distribution functions on the real line, and is the standard normal distribution function, while is the contamination proportion and is the correlation between and in the contaminated component, as before in model (1). Li et al.  also used a copula mixture model, but they placed emphasis on the mean while we focus on the dependence.
We still consider the testing problem (2), but now in the context of Model (5). The setting is nonparametric in that both and are unknown. Model (5) is crafted in such a way that the marginal distributions of and contain absolutely no information that is pertinent to the testing problem under consideration. Figure 1 provides an illustration.
The model is also attractive because of an invariance under all increasing marginal transformations of the variables. This is the same invariance that leads to considering rank based methods such as the Spearman correlation test [15, Chp 6]. In fact, we analyze the Spearman correlation test, which is the nonparametric analog to the covariance test, showing that it is first-order asymptotically optimal in the dense regime. We also propose and analyze a nonparametric version of the higher criticism based on ranks which we show is first-order asymptotically optimal in the moderately sparse regime where . In the very sparse regime, where , we do not know of any rank-based test that has any power.
2 Gaussian Mixture Model
In this section, we focus on the Gaussian mixture model (1). We start by deriving a lower bound on the performance of the likelihood ratio test, which provides a benchmark for the other (adaptive) tests, which we subsequently analyze.
We distinguish between the dense and sparse regimes:
We say that a testing procedure is asymptotically powerful (resp. powerless) if the sum of its probabilities of Type I and Type II errors (its risk) has limit 0 (resp. limit inferior at least 1) in the large sample asymptote.
2.1 The likelihood ratio test
This only provides a lower bound on what can be achieved, but it will turn out that to be sharp once we establish the performance of the higher criticism test in Proposition 2 below.
Defining and , the model (1) is equivalently expressed in terms of , which has distribution
Note that and are independent only conditional on knowing what distribution they were sampled from. In terms of the ’s, the likelihood ratio is
where is the likelihood ratio for observation , which in the present case takes the following expression
The risk of the likelihood ratio test is equal to
We show that under each of the stated conditions. We consider each regime in turn.
It turns out that it suffices to bound the second moment. Indeed, using the Cauchy-Schwarz inequality, we have
reducing the task to showing that . We have
For the third term, we have
Hence, we have
so that when
since is assumed to be bounded away from 1. Under the specified parameterization, this happens exactly when .
Sparse regime. It turns out that simply bounding the second moment, as we did above, does not suffice. Instead, we truncate the likelihood and study the behavior of its first two moments. Define the indicator variable and the corresponding truncated likelihood ratio
Using the triangle inequality, the fact that , and the Cauchy-Schwarz inequality, we have the following upper bound:
so that when and .
For the first moment, we have
where, using the independence of and , and taking the expectation with respect to first,
where, for ,
and we used the fact that when . Since with in the sparse regime, for sufficiently close to 1, , in which case . This yields
For the second moment, we have
The sum of first two terms is bounded from above by . For the third term, we have
using the fact that . Hence,
when is sufficiently close to 1. This in turn yields the following bound
so that when
Under the specified parameterization, this happens exactly when . ∎
2.2 The covariance test
Recall that the covariance test rejects for large values of , calibrated under the null where are iid standard normal.
For the testing problem (2), the covariance test achieves the detection boundary in the dense regime, while it is asymptotically powerless in the sparse regime.
We divide the proof into the two regimes.
Dense regime. Under , we have
so that, by Chebyshev’s inequality,
for any sequence diverging to infinity.
Under , we have
so that, by Chebyshev’s inequality,
Thus the test with rejection region is asymptotically powerful when
If we choose , for example, and is parameterized as in (6), this happens for large enough when .
Sparse regime. To prove that the covariance test is asymptotically powerless when , we show that, under , converges to the same limiting distribution as under .
, by the central limit theorem,
Under the distribution of the ’s (which remain iid) depends on , but the condition for applying Lyapunov’s central limit theorem are satisfied since
where and the inequality is Cauchy-Schwarz’s, while
so that the test statistic still converges weakly to a normal distribution,
In the present regime, we have
so that and , and thus we conclude by Slutsky’s theorem that . ∎
There are good reasons to consider the covariance test in this specific form since the means and variances are known. It is worth pointing out that the Pearson correlation test, which is more standard in practice since it does not require knowledge of the means or variances, has the same asymptotic power properties.
2.3 The higher criticism test and the extremes test
Define , and note that
Seen through the ’s, the problem becomes that of detecting a sparse contamination where the effect is in the variance. We recently studied this problem in detail , extending previous work by Cai et al. , who considered a setting where the effect is both in the mean and variance. Borrowing from our prior work, we consider a higher criticism test, already defined in (3), and an extremes test, which rejects for small values of .
For the testing problem (2), the higher criticism test achieves the detection boundary in the dense and sparse regimes.
with fixed such that ;
with fixed such that .
This can be directly translated into the present setting, yielding the stated result. ∎
This is also a direct corollary from our prior work our prior work [2, Prop 2]. ∎
Thus the extremes test is grossly suboptimal in the dense regime, while it is suboptimal in the the sparse regime due to the fact that .
The higher criticism and extremes tests are both based on the ’s. This was convenient as it reduced the problem of testing for independence to the problem of testing for a change in variance (both in a contamination model). However, reducing the original data, meaning the ’s, to the ’s implies a loss of information. Indeed, a lossless reduction would be from the ’s to the ’s, where
, with joint distribution given in (8). It just turns out that ignoring the ’s does not lead to any loss in first-order asymptotic power.
2.4 Numerical experiments
We performed some numerical experiments to investigate the finite sample performance of the tests considered here: the likelihood ratio test, the Pearson correlation test (instead of the covariance test from a practical point of view), the extremes test, the higher criticism test, and also a plug-in version of the higher criticism test where the parameters of the bivariate normal distribution (the two means and two variances) are estimated under the null. The sample sizeis set large to in order to capture the large-sample behavior of these tests. We tried four sparsity levels, setting . The p-values for each test are computed as follows:
For the likelihood ratio test, the p-values are estimated based on permutations.
For the higher criticism test and the plug-in higher criticism test, the p-values are estimated based on 200 permutations.
For the extremes test, we used the exact null distribution, which is available in a closed form.
For the Pearson correlation test, the p-values are from the limiting distribution.
For each scenario, we repeated the process 200 times and calculated the fraction of p-values smaller than 0.05, representing the empirical power at the 0.05 level.
The results of this experiment are reported in Figure 2 and are broadly consistent with the theory developed earlier in this section. Though we show that the higher criticism test is first-order comparable to the likelihood ratio test in the dense regime, even with a large sample, its power is much lower. The Pearson correlation test does better in that regime. The plug-in higher criticism test has a similar performance as the higher criticism test in the dense regime, while it loses some power in the moderately sparse regime, and is powerless in the very sparse regime.
3 Gaussian Mixture Copula Model
In this section we turn to the Gaussian mixture copula model introduced in (5). The setting is thus nonparametric, since the marginal distributions are completely unknown, and standard invariance considerations [15, Ch 6] lead us to consider test procedures that are based on the ranks. For this, we let denote the rank of among , and similarly, we let denote the rank of among . (The ranks are in increasing order, say.)
Although not strictly necessary, we will assume that and in (5) are strictly increasing and continuous. In that case, the ranks are invariant with respect to transformations of the form with and strictly increasing on the real line. In particular, for the rank tests that follow, this allows us to reduce their analysis under (5) to their analysis under (1).
3.1 The covariance rank test
The covariance rank test is the analog of the covariance test of Section 2.2. It rejects for large values of (redefined). As is well-known, this is equivalent to rejecting for large values of the Spearman rank correlation.
We anticipate that the covariance rank test is asymptotically powerless in the sparse regime, although we do not formally prove that.
We start by considering the null hypothesis . From [11, Eq 3.11-3.12, Ch 11], we have
so that, using Chebyshev’s inequality,
for any sequence diverging to infinity.
We now turn to the alternative hypothesis . For convenience, we assume that the ranks run from to . This does not change the test procedure since , but makes the derivations somewhat less cumbersome. In particular, we have
For the expectation, we have
The expectation is with respect to independent, with drawn from the mixture (1), and and standard normal. Let and , so that . We note that is bivariate normal with standard marginals. Moreover, when comes from the main component, and are uncorrelated, and therefore independent; while when comes from the contaminated component, and have correlation . Therefore,
where under . We immediately have , and in general,222 This identity is well-known, and not hard to prove (https://math.stackexchange.com/questions/255368/getting-px0-y0-for-a-bivariate-distribution). It also appears, for example, in [19, Lem 1].
We conclude that, as and ,
For the variance, we start with the second moment
which then implies that
the same bound we had for . Thus, by Chebyshev’s inequality, we have
for any sequence diverging to infinity.
We consider the test with rejection region . Our analysis implies that this test is asymptotically powerful when
If we choose , for example, and is parameterized as in (6), this happens for large enough when . ∎
3.2 The higher criticism rank test
The analog of the higher criticism test of (3) is a higher criticism based on the the pairwise differences in ranks, . To be specific, we define
where is the probability , which can be expressed in closed form as
Note that in this definition the denominator is only an approximation to the standard deviation of the numerator. The standard deviation has a closed-form expression, known since the work ofHoeffding [13, Th 2], but it is cumbersome and relatively costly to compute (although its computation is only done once for each ). Also, there is a fair amount of flexibility in the choice of range of thresholds considered. This particular choice seems to work well enough. As any other rank test, it is calibrated by permutation (or Monte Carlo if there are no ties in the data).
We start with the situation under the null hypothesis , where we show that is of order at most based on the concentration inequality for randomly permuted sums. Fixing critical value , define
Since is independent of , as we are under the null, we have that has the same distribution as when
is a uniformly distributed random permutation of. Note that
By [1, Cor 2.1], there is a universal constant such that, for any ,
using the fact that for all . This implies that, for ,
for some other constant , using the fact that when , which is the range of ’s we are considering. Hence, choosing and using the union bound, we have
We now consider the alternative , and show that in probability under the stated condition. Let
Since , it suffices to find some in that range such that in probability. For that, define the empirical distributions and . Note that, by definition, and