1 Introduction
Testing whether a subset of covariates have any relationship with a quantitative response is one of the central problems in statistical analyses. Most existing literature focuses on linear relationships. The analysis of variance (ANOVA) introduced by Fisher in 1920s has been a main tool for statistical analyses of experiments and routinely used in countless applications. Under the simple normal mean model,
where and , oneway ANOVA tests the null hypothesis : against the alternative hypothesis : at least one . Although the test cannot indicate which ’s are nonzero, ANOVA is powerful in testing the global null against alternatives of the form . AriasCastro et al. (2011b) extended these results to the linear model
(1) 
where , to test whether all the ’s are zero. This can be formulated as the following null and alternative hypotheses:
(2) 
where denotes the set of sparse vector in with the number of nonzero entries being no greater than . AriasCastro et al. (2011b) and Ingster et al. (2010) showed that one can detect the signal if and only if . The upper bound is guaranteed by an asymptotically most powerful test based on higher criticism (Donoho and Jin, 2004).
The linearity or other functional form assumption is often too restrictive in practice. Theoretical and methodological developments beyond parametric models are important, urgent, and extremely challenging. As a first step towards nonparametric global testing, we here study the single index model
, where is an unknown function. Our goal is to test the global null hypothesis that all the ’s are zero. The first challenge is to find an appropriate formulation of alternative hypotheses because used in (2) is not even identifiable in single index models.When is nonzero in a single index model, the unique nonzero eigenvalue of can be viewed as the generalized signal to noise ratio (gSNR) (Lin et al., 2018b). In Section 2, we show that for the linear regression model, this is almost proportional to when it is small. The alternative hypotheses in (2) can be rewritten as . Because of this connection, we can treat as the separation quantity for the single index model and consider the following contrasting hypotheses:
We show that, under certain regularity conditions, one can detect a nonzero gSNR if and only if for the single index model with additive noise.
This is a strong and surprising result because this detection boundary is the same as that for the linear model. Using the idea from the sliced inverse regression (SIR) (Li, 1991), we show that this boundary can be achieved by the proposed Spectral test Statistics using S
IR (SSS) and SSS with ANOVA test assisted (SSSa). Although SIR has been advocated as an effective alternative to linear multivariate analysis
(Chen and Li, 1998), the existing literature has not provided satisfactory theoretical foundations until recently (Lin et al., 2018a, b, 2017). We believe that the results in this paper provide further supporting evidence to the speculation that “SIR can be used to take the same role as linear regression in model building, residual analysis, regression diagnoses, etc” (Chen and Li, 1998).In Section 2, after briefly reviewing the SIR and related results in linear regression, we state the optimal detection problem and a lower bound for single index models. In Section 3, we first show that the correlationbased Higher Criticism (CorHC) developed for linear models fails for single index models, and then propose a test to achieve the lower bound stated in Section 2. Some numerical studies are included in Section 4. We list several interesting implications and future directions in Section 5. Additional proofs and lemmas are included in Appendices.
2 Generalized SNR for Single Index Models
2.1 Notation
The following notations are adopted throughout the paper. For a matrix , we call the space generated by its column vectors the column space and denote it by . The th row and th column of the matrix are denoted by and , respectively. For vectors and , we denote their inner product by , and the th entry of by . For two positive numbers , we use and to denote and respectively. Throughout the paper, we use , , and to denote generic absolute constants, though the actual value may vary from case to case. For two sequences and , we denote (resp. ) if there exist positive constant (resp. ) such that (resp. ). We denote if both and hold. We denote (resp. ) if (resp. ). The norm and norm of matrix A are defined by and respectively. For a finite subset , we denote by its cardinality. We also write for the submatrix with elements and for . For any squared matrix , we define and as the smallest and largest eigenvalues of , respectively.
2.2 A brief review of the sliced inverse regression (SIR)
SIR was first proposed by (Li, 1991)
to estimate the central space spanned by
based on i.i.d. observations , , from the multiple index model , under the assumption that follows an elliptical distribution and is Gaussian. SIR starts by dividing the data into equalsized slices according to the order statistics . To ease notations and arguments, we assume that and , and reexpress the data as and , where refers to the slice number and refers to the order number within the slice, i.e., Here is the concomitant of . Let the sample mean in the th slice be denoted by , then can be estimated by:(3) 
where denotes the matrix formed by the sample means, i.e., . Thus, is estimated by , where is the matrix formed by the eigenvectors associated to the largest eigenvalues of . The is a consistent estimator of under certain technical conditions (Duan and Li, 1991; Hsing and Carroll, 1992; Zhu et al., 2006; Li, 1991; Lin et al., 2017). It is shown in Lin et al. (2017, 2018a) that, for single index models (), can be chosen as a fixed number not depending on , , and for the asymptotic results to hold. Throughout this paper, we assume the following mild conditions.

and there exist two positive constants , such that .

Matrix is nonvanishing, i.e., .
2.3 Generalized SignaltoNoise Ratio of Single Index Models
We consider the following single index model:
(4) 
where is an unknown function. What we want to know is whether the coefficient vector , when viewed as a whole, is zero. This can be formulated as a global testing problem as
When assuming the linear model , whether we can separate the null and alternative depends on the interplay between and the norm of . More precisely, it depends on the signaltonoise ratio (SNR) defined as
where (Janson et al., 2017). Here is useful for benchmarking prediction accuracy for various model selection techniques such as AIC, BIC, or the Lasso. However, since there is an unknown link function in the single index model, the norm becomes nonidentifiable. Without loss of generality, we restrict and have to find another quantity to describe the separability.
For the single index model (4), to simplify the notation, let us use to denote . For linear models, we can easily show that
Consequently, . When assuming condition A2), such a ratio is bounded by two finite limits. Thus, can be treated as an equivalent quantity to the SNR for linear models, and is therefore named as the generalized signaltonoise ratio (gSNR) for single index models.
Remark 1.
To the best of our knowledge, although SIR uses the estimation of to determine the structural dimension (Li (1991)), few investigations have been made towards theoretical properties of this procedure in high dimensions. The only work that uses as a parameter to quantify the estimation error when estimating the direction of is Lin et al. (2018a), which, however, does not indicate explicitly what role plays. The aforementioned observation about for single index models provides a useful intuition: is a generalized notion of the SNR, and condition merely requires that gSNR is nonzero.
2.4 Global testing for single index models
As we have discussed, AriasCastro et al. (2011b) and Ingster et al. (2010) considered the testing problem (2), which can be viewed as the determination of the detection boundary of gSNR. Through the whole paper, we consider the following testing problem:
(5) 
based on i.i.d. samples . Two models are considered: (i) the general single index model (SIM) defined in (4); and (ii) the single index model with additive noise (SIMa) defined as
(6) 
We assume that conditions and hold for both models.
3 The Optimal Test for Single Index Models
3.1 The detection boundary of linear regression
To set the goal and scope, we briefly review some related results on the detection boundary for linear models (AriasCastro et al., 2011b; Ingster et al., 2010).
Proposition 1.
Assume that , , and that has at most nonzero entries. There is a test with both type I and II errors converging to zero for the testing problem in (2) if and only if
(7) 
Assuming and the variance of the noise is known, Ingster et al. (2010) obtained the sharp detection boundary (i.e., with exact asymptotic constant) for the above problem. Since linear models are special cases of SIMa, which is a special subset of SIM, the following statement about the lower bound of detectability is a direct corollary of Proposition 1.
Corollary 1.
If , then any test fails to separate the null and the alternative hypothesis asymptotically for SIM when
(8) 
Any test fails to separate the null and the alternative hypothesis asymptotically for SIMa when
(9) 
3.2 Single Index Models
Moving from linear models to single index models is a big step. A natural and reasonable start is to consider tests based on the marginal correlation used for linear models (Ingster et al., 2010; AriasCastro et al., 2011b). However, the following example shows that the marginal correlation fails for the single index models, indicating that we need to look for some other statistics to approximate the gSNR.
Example 1.
Suppose that , , and we have samples from the following model:
(10) 
Simple calculation shows that . Thus, correlationbased methods do not work for this simple model. On the other hand, since the link function is monotone when is sufficiently large, we know that is not a constant and .
Let and be two sequences such that
For a symmetric matrix and a positive constant such that , we define
(11) 
For model , in addition to the condition that , we further assume that .
Let be the estimate of based on SIR. Let , and be three quantities satisfying
(12) 
We introduce the following two assistance tests

Define

Define
Finally, the Spectral test Statistic based on SIR, abbreviated as SSS, is defined as
(13) 
To show the theoretical properties of SSS, we impose the following condition on the covariance matrix .

There are at most nonzero entries in each row of .
This assumption is first explicitly proposed in Lin et al. (2017), which is partially motivated by the Separable After Screening (SAS) properties in Ji and Jin (2012). In this paper, we assume such a relative strong condition and focus on establishing the detection boundary. This condition can be possibly relaxed by considering a larger class of covariance matrices
which is used in AriasCastro et al. (2011a) for analyzing linear models. Our condition contains for some positive constant and we could relax our constraint to some . However, the technical details will be much more involved, which masks the importance of the main results. We thus leave it for a future investigation.
Theorem 1.
Assume that , , and conditions hold. Two sequences and satisfy the conditions in (12
). Then, type I and type II errors of the test
converge to zero for the testing problem under SIM, i.e., we haveComparing with the test proposed in Ingster et al. (2010)
, our test statistics is a spectral statistics and depends on the first eigenvalue of
. It is adaptive in the moderatesparsity scenario. In the highsparsity scenario when , the SSS relies on , which depends on the sparsity of the vector . Therefore, SSS is not adaptive to the sparsity level. Both AriasCastro et al. (2011a) and Ingster et al. (2010) introduced an (adaptive) asymptotically powerful test based on the higher criticism (HC) for the testing problem under linear models. It is an interesting research problem to develop an adaptive test using the idea of higher criticism for (5).3.3 Optimal Test for SIMa
When the noise is assumed additive as in SIMa (6), the detection boundary can be further improved. In addition to conditions A1A3), is further assumed to satisfy the following condition:

is subGaussian, and for some constant , where .
Note that for any fixed function such that , there exists a positive constant such that
(14) 
By continuity, we know that (14) holds in a small neighbourhood of , i.e., if is sufficiently small, condition holds for a large class of functions.
First, we adopt the test described in the previous subsection. Since the noise is additive, we include the ANOVA test,
where and is a sequence satisfying the condition (12). Combing this test with the test , we can introduce SSS assisted by ANOVA test (SSSa) as
(15) 
We then have the following result.
Theorem 2.
Assume that and the conditions and hold. Assume that the sequences , and satisfy condition (12), then type I and type II errors of the test converge to zero for the testing problem under SIMa, i.e., we have
Example continued. For the example in (10), we calculated the test statistic defined by (13) under both the null and alternative hypotheses. Figure 1 shows the histograms of such a statistic under both hypotheses, demonstrating a perfect separation between the null and alternative. For this example, has more discrimination power than .
3.4 Computationally efficient test
Although the test (and ) is rate optimal, it is computationally inefficient. Here we propose an efficient algorithm to approximate via a convex relaxation, which is similar to the convex relaxation method for estimating the top eigenvector of a semidefinite matrix in Adamczak et al. (2008). To be precise, given the SIR estimate of , consider the following semidefinite programming (SDP) problem:
(16)  
subject to  
With , for a sequence satisfying the condition in (12), i.e., , a computationally feasible test is
Then, for any sequence satisfying the inequality in (12), we define the following computationally feasible alternative of :
(17) 
Theorem 3.
Assume that , and conditions hold. Then, type I and type II errors of the test converge to zero for the testing problem under SIMa, i.e., we have
Similarly, if we introduce the test
(18) 
for three sequences , and , then we have
Theorem 4.
Assume that and conditions and hold. The test is asymptotically powerful for the testing problem under SIMa, i.e., we have
Theorem 2 and Theorem 4 not only establish the detection boundary of gSNR for single index models, but also open a door of thorough understanding of semiparametric regression with a Gaussian design. It is shown in Lin et al. (2018a) that for single index models satisfying conditions A1), A2), A3), one has
(19) 
This implies that the necessary and sufficient condition for obtaining a consistent estimate of the projection operator is . On the other hand, Theorems 2 and 4 state that, for single index models with additive noise, if , then one can detect the existence of gSNR (a.k.a. nontrivial direction ). Our results thus imply for SIMa that, if , one can detect the existence of nonzero , but can not provide a consistent estimation of its direction. To estimate the location of nonzero coefficient, we must tolerate a certain error rate such as the false discovery rate (Benjamini and Hochberg (1995)). For example, the knockoff procedure (Barber and Candès (2015)), SLOPE (Su and Candes (2016)), and UPT(Ji and Zhao (2014)) might be extended to single index models.
3.5 Practical Issues
In practice, we do not know whether the noise is additive or not. Therefore, we only consider the test statistic . Condition (12) provides us a theoretical basis for choosing the sequences and . In practice, however, we determine these thresholds by simulating the null distribution of and . Our final algorithm is as follows.
4 Numerical Studies
Let be the vector of coefficients and let be the active set, , for which we simulated . Let be the random design matrix with each row following . We consider two types of covariance matrices: (i) with and ; and (ii) , when or , and when . The first one represents a covariance matrix which is essentially sparse and we choose among 0, 0.3, 0.5, and 0.8. The second one represents a dense covariance matrix with chosen as 0.2. In all the simulations, , varies among 100, 500, 1,000, and 2,000 and the number of replication is 100. The random error follows . We consider the following models:

, where ;

, where ;

, where ;

, where .
Model  Dim  SSS  HC  Model  Dim  SSS  HC  

I  100  0  1.00  0.16  II  100  0  0.98  0.12 
0.3  1.00  0.29  0.3  0.97  0.16  
0.5  0.99  0.54  0.5  0.96  0.24  
0.8  1.00  0.93  0.8  1.00  0.37  
0.2  0.90  0.35  0.2  0.96  0.56  
500  0  0.98  0.16  500  0  0.87  0.06  
0.3  0.99  0.18  0.3  0.80  0.09  
0.5  0.97  0.34  0.5  0.82  0.13  
0.8  0.98  0.71  0.8  0.83  0.14  
0.2  0.52  0.25  0.2  0.77  0.32  
1000  0  0.89  0.19  1000  0  0.81  0.09  
0.3  0.88  0.16  0.3  0.74  0.06  
0.5  0.91  0.33  0.5  0.77  0.08  
0.8  0.96  0.53  0.8  0.84  0.11  
0.2  0.37  0.30  0.2  0.69  0.25  
2000  0  0.92  0.18  2000  0  0.75  0.11  
0.3  0.86  0.25  0.3  0.68  0.12  
0.5  0.83  0.43  0.5  0.68  0.13  
0.8  0.90  0.60  0.8  0.81  0.10  
0.2  0.43  0.17  0.2  0.63  0.41  
III  100  0  1.00  0.21  IV  100  0  0.89  0.01 
0.3  1.00  0.25  0.3  0.91  0.03  
0.5  1.00  0.63  0.5  0.89  0.04  
0.8  1.00  1.00  0.8  1.00  0.10  
0.2  0.98  0.78  0.2  0.94  0.07  
500  0  0.99  0.11  500  0  0.70  0.03  
0.3  1.00  0.12  0.3  0.57  0.04  
0.5  0.98  0.11  0.5  0.57  0.07  
0.8  0.99  0.22  0.8  0.69  0.09  
0.2  0.62  0.72  0.2  0.45  0.08  
1000  0  0.99  0.11  1000  0  0.55  0.07  
0.3  0.97  0.06  0.3  0.56  0.04  
0.5  0.97  0.18  0.5  0.51  0.09  
0.8  0.92  0.10  0.8  0.73  0.06  
0.2  0.60  0.59  0.2  0.44  0.08  
2000  0  0.96  0.16  2000  0  0.58  0.07  
0.3  0.97  0.19  0.3  0.47  0.07  
0.5  0.93  0.15  0.5  0.45  0.09  
0.8  0.88  0.10  0.8  0.61  0.02  
0.2  0.59  0.58  0.2  0.40  0.08 
We choose in the Algorithm 1. If we calculate test statistics for each replication, it will take an extremely long time. Therefore, in the simulation, we calculate and slightly different from Algorithm 1. For each generated data set, we simulated only one vector where and calculate the statistic and . The and are chosen as 95% quantile from the corresponding sequence for all the replications.
For each generated data, we also calculated CorHC scores according to AriasCastro et al. (2012). The threshold is chosen according to the same scheme as choosing the thresholds and . Namely, we calculated the CorHC scores based on where . The threshold is the 95% quantile of these simulated scores. The hypotheses is rejected if the CorHC score is greater than . The power for both methods is calculated as the average number of rejections out of 100 replications. These numbers are reported in Tables 1.
It is clearly seen that the power of SSS decreases when the dimension increases. Nevertheless, the power of SSS is better than the one based on CorHC except for one case. In Figure 2, we plot the histogram of the statistic under the null in the topleft panel and the histogram of this statistic under the alternative in the bottomleft panel for Model III when and for type (i) covariance matrix. It is clearly seen that the test statistic are well separated under the null and alternative. However, CorHC fails to distinguish between the null and alternative as shown in the two panels on the right side.
To see how the performance of CorHC varies, we consider the following model

, where
Set , , and for type (i) covariance matrix, and the power of both methods are displayed in Figure 3. The coefficient determines the magnitude of the marginal correlation between the active predictors and the response. It is seen that when is close to 16, representing the case of diminishing marginal correlation, the power of CorHC dropped to the lowest. Under all the models, SSS is more powerful in detecting the existence of the signal.
To observe the influence of the signaltonoise ratio on the power of the tests, we consider the following two models

, where ;

, where .
Here .
Set and , we plot the power of both methods against the coefficient in Figure 4
. It is clearly seen that for both examples there is a sharp ”phasetransition” for the power of SSS as the signal strength increases, validating our theory about the detection boundary. In both examples SSS is much more powerful than CorHC.
5 Discussion
Assuming that is nonvanishing, we show in this paper that , the unique nonzero eigenvalue of associated with the single index model, is a generalization of the SNR. We demonstrate a surprising similarity between linear regression and single index models with Gaussian design: the detection boundary of gSNR for the testing problem (5) under SIMa matches that of SNR for linear models (2). This similarity provides an additional support to the speculation that “the rich theories developed for linear regression can be extended to the single/multiple index models” (Lin et al., 2018b; Chen and Li, 1998).
Besides the gap we explicitly depicted between detection and estimation boundaries, we provide here several other directions which might be of interests to researchers. First, although this paper only deals with single index models, the results obtained here are very likely extendable to multiple index models. Assume that the noise is additive and let be the nonzero eigenvalues associated with the matrix of a multiple index model. Similar arguments can show that the th direction is detectable if . New thoughts and technical preparations might be needed for a rigorous argument for determining the lower bound of the detection boundary. Second, the framework can be extended to study theoretical properties of other sufficient dimension reduction algorithms such as SAVE and directional regression (Lin et al. (2017, 2018a, 2018b)).
6 Acknowledgment
We thank Dr. Zhisu Zhu for his generous help with SDP.
APPENDIX: PROOFS
Appendix A Assisting Lemmas
Since our approaches are based on the technical tools developed in Lin et al. (2017, 2018a, 2018b), we briefly recollect the necessary (modified) statements without proofs below.
Lemma 1.
Let . Let be positive constants satisfying . Then for any , we have
(20) 
Lemma 2.
Suppose that a matrix formed by dimensional vector , where for some constants and . We have
(21) 
with probability at most
for some positive constant . In particular, we know that(22) 
happens with probability at least .
Lemma 3.
Assume that . Let be a matrix, where and are scalar, is a vector and is a matrix satisfying
(23) 
for a constant where . Then we have
(24) 
Sliced approximation inequality
The next result is referred to as ‘key lemma’ in Lin et al. (2017, 2018a, 2018b) , which depends on the following sliced stable condition.
Definition 1 (Sliced stable condition).
For , let denote all partitions of satisfying that
A curve is sliced stable with respect to y, if there exist positive constants and large enough such that for any , for any partition in and any , one has:
(25) 
A curve is sliced stable if it is sliced stable for some positive constant .
The sliced stable condition is a mild condition. Neykov et al. (2016) derived the sliced stable condition from a modification of the regularity condition proposed in Hsing and Carroll (1992). The inequality (25) implies the following deviation inequality for multiple index models. For our purpose, we modified it for single index models.
Lemma 4.
Assume that Conditions , and the sliced stable condition (for some ) hold in the single index model . Let be the SIR estimate of , and let be the projection matrix associated with the column space of . For any vector and any , let . There exist positive constants , and such that for any and satisfying that , one has
(26) 
Lin and Liu (2017) recently proved a similar deviation inequality without the sliced stable condition.
Appendix B Proof of Theorems
Proof of Theorem 1
Lemma 5.
Assume that , and be a sequence such that . Then, as , we have:
Under
Comments
There are no comments yet.