1 Introduction
Ultrahighdimensional data appears in various scientific research areas, including genetic data, financial data, survival data, and so on. In regression analysis, ultrahighdimensional data is very difficult to analyze since it contains many unimportant variables in the sense that those variables are not highly correlated to the response. In addition, the covariance matrix of the variables is usually singular due to that the dimension of variables is ultra higher than the sample size. As a result, we should select the informative variables before constructing regression models. Moreover, to cope with ultrahigh dimensionality, the assumption of sparsity is imposed. In other words, there are only a small number of predicting variables associated with the response.
In the early development of variable selection, Akaike’s Information Criterion (AIC) (Akaike 1973) and Bayesian Information Criterion (BIC) (Schwarz 1978) are two wellknown conventional variable selection criteria. Those two methods aim to search over all possible combinations so that the optimal solution is achieved. However, in ultrahighdimensional data, it is near impossible to search the final model through all possible combinations of variables. In the two decades, some regularization methods have been proposed to select variables. Those methods include the LASSO (Tibshirani 1996), SCAD (Fan and Li 2001), LARS (Efron et al. 2004), elastic net (Zou and Hastie 2005), adaptive LASSO (Zou 2006), and Dantzig selector (Candes and Tao 2007) methods. However, those methods are mainly implemented in highdimensional data but the dimension of variables is smaller than the sample size, and they may perform worse for ultrahighdimensional data.
To address ultrahighdimensional data with stable computation and accurate selection, Fan and Lv (2008) first proposed the sure independent screening (SIS) procedure for ultrahighdimensional linear model which utilized the Pearson correlation to rank the importance of each predictor. Hall and Miller (2009) developed the bootstrap procedure to rank the importance of each predictor based on Pearson correlation between the response and predictors. Fan et al. (2009) and Fan and Song (2010) considered to rank the importance of each predictor through marginal maximum likelihood. Different from the SIS method which specifies the model structure, Zhu et al. (2011) and Li et al. (2012) proposed the modelfree feature screening to capture the informative covariates for the ultrahighdimensional data.
Even though feature screening methods for ultrahighdimensional data have been developed, the research gaps still exist. Specifically, in survival analysis with genetic data, the response (failure time) is usually incomplete due to rightcensoring and the covariates are usually contaminated with measurement error. It is not trivial to implement the conventional feature screening methods to analyze such data. Actually, in the presence of the incomplete response (or survival data) and precise measurement, some valid methods have been proposed. To name a few, Fan et al. (2010) proposed SIS method but restricted on Cox model. Song et al. (2014) proposed the censored rank independence screening. Yan et al. (2017) proposed the Spearman rank correlation screening. Chen et al. (2018) developed the robust feature screening based on distance correlation. Chen et al. (2019) considered a modelfree survival conditional feature screening. In the presence of measurement error, however, it is unknown that whether or not those existing methods can determine the “correct” features for the surrogate version of the covariates.
The other crucial issue is the accuracy of feature screening. Since conventional SIS methods rank the importance of each predictor through marginal utilities, then those methods may fail to detect truly important predictors which are marginally independent of the response due to correlations among predictors. The detailed example is deferred in Section 2.4. To overcome this problem, Fan and Lv (2008) proposed the iterative SIS method. Zhong and Zhu (2015) developed the iterated distance correlation to improve the accuracy of variable screening. These methods, however, are based on complete data and free of mismeasurement. For ultrahighdimensional survival data with measurement error in covariates, there is no method to deal with this problem. As a result, we mainly explore this important problem with both survival data and covariates measurement error incorporated. In our development, we first present the distance correlation with error correction for feature screening. Under such approach, the set of selected surrogate variables is the same as the set of selected unobserved variables. After that, we propose the valid iterated procedure with error correction to improve the accuracy of feature screening. In particular, our proposed method is free of model specification and free of specification of distribution for the covariates.
The remainder is organized as follows. In Section 2, we introduce the survival data with rightcensoring, measurement error model and the distance correlation method. In Section 3, we propose the iteration algorithm of feature screening procedure for censored data and covariates measurement error. Empirical studies, including simulation results and real data analysis, are provided in Sections 4 and 5, respectively. We conclude the article with discussions in Section 6.
2 Notation and Model
2.1 Survival Data
In survival analysis, the response is usually incomplete due to the presence of the censoring time. Specifically, let be the failure time and be the censoring time. Then let and denote , where is the indicator function. Let be the
dimensional random vector of covariates. Suppose that we have a sample of
subjects and that for , has the same distribution as and represents realizations of . Let denote a maximum support of the failure time. Some regular conditions are imposed.
, where is an upper bound of failure times which is assumed to be finite and is the risk set.

Censoring time is noninformative. That is, the failure time and the censoring time are independent.
2.2 Measurement Error Model
Let denote the surrogate, or observed covariate, of . Let and be the covariance matrices of and , respectively. For , has the same distribution as . Let denote the realizations of . In this paper, we focus on the the following measurement error model
(1) 
for , where is independent to , with covariance matrix . Here can be known or unknown. Hence, to discuss
and its estimation, we consider the following three scenarios:
 Scenario I

: is known.
In this scenario, is a constant matrix. Therefore, it is straightforward to discuss the analysis.
 Scenario II

: is unknown and repeated measurements is available.
 Scenario III

: is unknown and validation data is available.
Suppose that is the subject sets for the main study containing subjects and is the subject sets for the external validation study containing subjects. Assume that and do not overlap. Therefore, the available data contain measurements from the main study and from the validation sample. Hence, for the measurement error model, we have
for , where the and independent to . In this case, applying the least square regression method gives
(3) where .
2.3 Review of the Distance Correlation Method
In this section, we briefly review the distance correlation (DC) method, which was first proposed by Székely et al. (2007).
Let and
denote the characteristic functions of two random vectors
and , respectively, and let be the joint characteristic function of and . Let for any complex function , where is the conjugate of . The distance covariance between and is defined aswhere and are dimensions of and , respectively, and
with and is the Euclidean norm of any vector . Therefore, the DC is defined as
(4) 
Székely et al. (2007) showed that two random vectors and are independent if and only if . This property motivates us to do the feature screening and identify which covariates are dependent with the response (e.g., Li et al. 2012). The detailed estimation of (4) can be found in Li et al. (2012).
2.4 Potential Problem in Conventional Screening Method
As discussed in Section 1, even though many feature screening methods have been proposed, those methods cannot capture all important variables due to that some variables are highly correlated with others. To see this problem explicitly, we consider the following regression model which was adopted by Fan and Lv (2008)
(5) 
where with is a vector of covariates and each
is generated from the normal distribution with mean zero and unit variance. The correlations of all
except are , while has the correlation with all other variables.It is clear to see that variables with are included in model (5). By the feature screening based on conventional distance correlation method, we can only identify and
, while there is a large probability that
cannot be identified due to .This simple example verifies that the conventional feature screening method fails to select whole important variables. To successfully identify the variable , Fan and Lv (2008) proposed the iterated SIS method. Zhong and Zhu (2015) considered the iterated distance correlation method. In survival analysis, however, the response in model (5) is usually incomplete due to rightcensoring. Therefore, in the presence of rightcensoring, it is not trivial to implement those conventional methods to deal with survival data. In addition, the other challenge comes from the mismeasurement of covariates. Specifically, variables  in model (5) may be contaminated with measurement error, and we only have the surrogate variables . It is expected that the important variables with label 14 cannot be identified if we ignore the impact of mismeasurement. As a result, it is also crucial to take care the measurement error effect.
3 The Proposed Method
3.1 Feature Screening for Censored Data and Measurement Error
To present the setting, we start from the unobserved covariate .
Let denote the conditional distribution function of given , and let
denote the active set which contains all relevant predictors for the response with and , and is the complement of which contains all irrelevant predictors for the response . In this case, let denote the vector containing all the active predictors, and let be the vector containing all the irrelevant predictors.
If is complete, i.e., , then it is straightforward to implement conventional methods to determine the active set. However, if
is incomplete, i.e., rightcensoring occurs, then we impute
by (Buckley and James 1979)indicating that (Miller 1981, p. 151). In addition, by Condition (C1) in Section 2.1, can be written as
where and are the density and distribution functions of , respectively. Moreover, can be estimated by
where is the KaplanMeier estimator of . As a result, the estimator of , denoted as , is determined by replacing with , and thus, we have
Finally, the crucial target is the determination of the active set . In the presence of measurement error, we adopt the DC method described in Section 2.3 with modification. Let denote the characteristic function of , where is a complex number with . Define
and
If is unknown, then it can be estimated based on repeated measurement or validation in (2) or (3). Therefore, we define
and
(6) 
As a result, to select features, it suffices to consider
(7) 
for , and the corresponding estimator is
(8) 
As suggested by Li et al. (2012), let the threshold value be for some constants and , then the estimated active set is given by
(9) 
To see the validity of the criterion (7), we have the following theorem:
Theorem 3.1
Active features based on and are the same. That is, for every ,
or
where is determined by implementing and to (4).
Generally speaking, Theorem 3.1
suggests that based on the feature selection criterion (
7), the true and surrogate covariates share the same active set . Furthermore, similar derivation in Li et al. (2012) yields that has the sure screening property in this sense that as . Therefore, we can decompose the measurement error model (1) by(10a)  
(10b) 
where , , and . The covariance matrix can be further decomposed as
where is the covariance matrix based on (10a), is the covariance matrix based on (10b), and is the covariance matrix based on the interaction of (10a) and (10b).
3.2 Iteration Algorithm
As motivated by example in Section 2.4, directly implementing (7) may lose some important variables. To increase the probability of selecting all important variables, we modify the selection criterion (7) and develop the iterated feature screening procedure.
The key idea is as follows: We first implement the feature screening criterion (7) to determine and . It is noted that there exist some potential important variables in but not be identified. Therefore, to determine the other important variables in , a natural way is to remove the correlations of and by regressing onto
. As a result, the residuals obtained from such linear regression are then uncorrelated with
. Therefore, other important variables in can be identified by residuals and .Specifically, to present the idea explicitly, we provide the following iteration algorithm:
 Step 1:

Initial determination of the active set.
Let denote the covariate matrix, where is a dimensional vector of th covariate with .
In this stage, we first implement (7) to determine the initial active set and the corresponding relevant covariate is with dimension . Let denote the irrelevant covariates matrix with dimension such that . In addition, based on feature selection criterion (7) and Theorem 3.1, the active set based on the surrogate variables is equal to the set based on the true covariates. Therefore, we also have .
 Step 2:

Improvement.
In this stage, we aim to search other important variables in . Our main approach is to regress onto and update the active set through the residual.
In this paper, we consider the multivariate linear regression model and the ordinary least square is given by
where is the norm and is the parameter matrix with dimension . The corresponding score function is
However, in the presence of covariates measurement error , we only observe and , then the score function becomes
(11) It is well known that directly solving may incur the estimator of with tremendous bias (e.g., Carroll et al. 2006). Instead, by the simple calculation, we obtain
(12) such that
indicating that is the suitable score function which corrects errorprone variables. Therefore, the estimator of based on (12) is given by
(13)  Step 3:

Update of the active set.
Update the active set by and continue Step 2 until no more covariate is included. Finally, the final model is .
In practice, as suggested in Yan et al. (2017), Chen et al. (2019) and among others, we can specify the size of the active set to be , where stands for the floor function. In this sense, based on the iteration algorithm, we can first select variables with size in Step 1, and then determine the variables with size in Step 2.
4 Simulation Studies
4.1 Simulation Setup
Let denote the sample size. Let with , or denote a dimensional vector of covariates which is generated from the normal distribution with mean zero and the covariance matrix with the diagonal elements being one and the nondiagonal elements being the correlations of all with . Similar to the setting with an example in Section 2.4, we specify the correlations of all except to be , while has the correlation with all other variables. We consider or .
The failure time is generated by the following model:
Specifying the distribution of the error term
gives some commonly used survival models. In this paper, we consider the extreme value distribution for the proportional hazards (PH) model and the logistic distribution for the proportional odds (PO) model. The censoring time
is generated from the uniform distribution
where is a constant such that the censoring rate is approximately 50%. As a result, we have and . For , the survival data is .For the measurement error model (1), let be generated from the normal distribution with mean zero and the diagonal matrix with entries being , 0.5, or 0.75. If is unknown, then the following two scenarios are considered as additional information:
 Scenario 1:

Repeated measurement
For and with , and are again be generated from and , respectively, and is generated by
for and . As a result, can be estimated by (2).
 Scenario 2:

Validation data
For with , and are again be generated from and , respectively, and is generated by
for . Therefore, can be estimated by (3).
Finally, we repeat simulation 1000 times in each setting.
4.2 Simulation Results
To evaluate the finitesample performance of the proposed method, we consider the proportion that each active covariates is selected out of 1000 simulations which is denoted by , and the proportion that all active covariates are selected out of 1000 simulations which is denoted by . In addition, for the comparisons, we also examine the naive estimator, which is derived by directly implementing the observed covariates and taking iteration through (11). For two different survival models and several settings of , we compare the results obtained from applying the proposed method to the surrogate covariates as opposed to the estimators obtained from fitting the data with the true covariate measurements.
The numerical results are placed in Tables 13. Since feature screenings based on the naive and proposed methods use the same criterion (7), so the screening result are the same. Furthermore, the results of feature screening based on the true covariates are similar to the results based on the surrogate covariates regardless values of and . It also verifies Theorem 3.1. However, the feature screening method can successfully select variables and with high probability, but is selected with low proportion. This result is consistent with the example in Section 2.4. On the contrary, from Tables 13, we can see that the iterated feature screening method based on corrected score function (12) successfully identify the variable with high proportion. This result is parallel to the case that the true covariate is implemented. On the other hand, even the iterated feature screening method is implemented, cannot be identified if the measurement error effect is not corrected appropriately. This result is verified by the naive method with the usage of (11).
5 Data Analysis
5.1 Analysis of The Mantle Cell Lymphoma Microarray Data
We first illustrate the proposed methods by an application to the mantle cell lymphoma microarray dataset, available from http://llmpp.nih.gov/MCL/. The dataset contains the survival time of 92 patients and the gene expression measurements of 8810 genes for each patient. However, we only concern 6312 genes after deleting 2498 ones appearing to be missing. During the followup, 64 patients died of mantle cell lymphoma and the other 28 ones were censored, causing 36% censoring ratio. The aim of the study was to formulate a molecular predictor of survival after chemotherapy for the disease.
Since this dataset contains no information to characterize the degree of measurement error that is accompanying with the gene expressions, here we conduct sensitivity analyses to investigate the measurement error effects on analysis results. Specifically, let be the covariance matrix of the gene expressions. For sensitivity analyses, we consider to be the covariance matrix for the measurement error model (1), where is the diagonal matrix with diagonal elements being a common value , which is specified as , or to feature a setting with minor, moderate or severe measurement error. Let , indicating that we aim to select 20 variables in the active set . In the iteration algorithm, we first select 8 gene expressions, and then the remaining 12 gene expressions are selected by either (11) or (12). For comparisons, we examine the feature screening (FS) method in Section 3.1 and the iterated feature screening (IFS) method in Section 3.2. The selection results are summarized in Table 4.
From Table 4, we can see that both feature screening and iterated feature screening methods have the same results in the first 8 gene expressions regardless of proposed and naive methods. It indicates that the first 8 gene expressions are clearly dependent on the response and easily identified. In the remaining 12 gene expressions, on the other hand, the screening results are different. Specifically, the iterated feature screening method select some gene expressions, such as 29897, 30620, 32699 and so on, regardless of different degrees of measurement error effects, and those selected gene expressions are not shown in the result of feature screening method. It implies that the iterated feature screening method select some potentially important variables which are not identified by the feature screening method. Furthermore, for the result based on naive method, even the iterated feature screening method is implemented, the selections in the remaining 12 gene expressions are different from the result based on the correction of error effect. The main reason comes from the usage of the estimator of solved by (11) or (12).
5.2 Analysis of NKI Breast Cancer Data
In this section, we implement our proposed method to analyze the breast cancer data collected by the Netherlands Cancer Institute (NKI) (van de Vijver et al. 2002). Tumors from 295 women with breast cancer were collected from the freshfrozentissue bank of the Netherlands Cancer Institute. Tumors of those patients were primarily invasive breast cancer carcinoma that were about 5 cm in diameter. Patients at diagnosis were 52 years or younger and the diagnosis was done from 1984 to 1995. Of all those patients, 79 patients died before the study ended, yielding approximately the 73.2% censoring rate. For each tumor of patient, approximate 25000 gene expressions were collected. Consistent with the analysis of gene expression data, we treat log intensity as the covariates.
Since this dataset also contains no information to characterize the degree of measurement error that is accompanying with the gene expressions, similar to the idea in Section 5.1, we conduct sensitivity analyses to investigate the measurement error effects on analysis results. That is, let be the covariance matrix of the gene expressions, and we consider to be the covariance matrix for the measurement error model (1), where is the diagonal matrix with diagonal elements being a common value , which is specified as , or to feature a setting with minor, moderate or severe measurement error. Let , indicating that we aim to select 18 variables in the active set . In the iteration algorithm, here we first select 7 gene expressions, and then the remaining 11 gene expressions are selected by either (11) or (12). Similar to the procedure in Section 5.1, we investigate the feature screening (FS) method in Section 3.1 and the iterated feature screening (IFS) method in Section 3.2. The selection results are summarized in Table 5.
From Table 5, the result of NKI data is parallel to the result in Section 5.1, in the sense that both feature screening and iterated feature screening methods have the same results in the first 7 gene expressions regardless of proposed and naive methods. It indicates that the first 7 gene expressions are clearly dependent on the response and easily identified. In the remaining 11 gene expressions, on the other hand, the screening results are different. For example, the iterated feature screening method select some gene expressions, such as NM 020188, Contig25991, NM 003882 and so on, regardless of different degrees of measurement error effects, and those selected gene expressions are not shown in the result of feature screening method. It implies that the iterated feature screening method select some potentially important variables which are not identified by the feature screening method. Furthermore, for the result based on naive method, even the iterated feature screening method is implemented, the selections in the remaining 11 gene expressions are different from the result based on the correction of error effect. The main reason comes from the usage of the estimator of solved by (11) or (12).
6 Conclusion
Ultrahighdimensional data analysis is one of an important topics in decades, and it appears frequently in many practical situations and research fields, such as biological data and financial data. Many methods have been developed to deal with this problem. In the presence of censored data and covariates measurement error simultaneously, however, little method is available. Furthermore, some truly important covariates may be failed to be detected due to correlations among other covariates.
To overcome those challenges, we propose the valid feature screening method to deal with ultrahighdimensionality with both censored data and covariates measurement error incorporated simultaneously. Different from other feature screening methods based on censored data, the proposed method enables to determine the same active predictors based on the surrogate and unobserved covariates. To improve the accuracy of feature screening and identify some potentially important variables, we further develop the iterated feature screening with correction of measurement error. Throughout the simulation studies and real data analysis, it is verified that the iterated feature screening method yields the satisfactory results and outperforms the feature screening and naive methods.
There are some possible extensions and applications. First, even the dimension of variables is reduced to be smaller than the sample size, sometimes the dimension is still high and some unimportant variables may still contain in the dataset. In this case, we then implement the variable selection techniques, such as LASSO or SCAD, to identify the most important variables and shrink other unimportant variables. Second, although we mainly consider continuous covariates and classical measurement error model, the proposed method can be naturally extended to other types of variables, such as binary and count variables, and other measurement error models, including Berkson error model. Furthermore, the binary covariates with mismeasurement, also called misclassification, is also a crucial problem. Finally, in addition to rightcensoring, some complex structures, such as lefttruncation (e.g., Chen 2019), also appear in survival data with ultrahigh dimensionality. It is also interesting to explore this problem by extending the proposed method. These important topics are our future work.
Appendix: Proof of Theorem 3.1
We first consider and . Note that the former formulation is based on the true covariates , while the latter formulation is based on the surrogate covariates .
Since the error term follows normal distribution , then its characteristic function is given by
(A.1) 
By the direct computation, we have
(A.2)  
where the second equality is due to the independence of and , and the last equality is due to (A.1).
In addition, we can also derive
(A.3)  
where the second equality is due to the independence of and , and the last equality again comes from (A.1). As a result, combining (A.2) and (A.3) with gives the same expression of .
The equivalence of and holds by the similar derivations. Therefore, we conclude that and are equivalent in the sense that if and only if . Consequently, the same active features can be determined for and .
References
Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory, eds by Petrov, N. and Czaki, F., 267  281. Akademiai Kaido, Bydapest.
Buckley, J. and James, I. (1979) Linear regression with censored data. Biometrika, 66, 429436.
Candes, E. and Tao, T. (2007) The Dantzig selector: statistical estimation when p is much larger than n (with discussion). The Annals of Statistics, 35, 2313  2404.
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006) Measurement Error in Nonlinear Model. CRC Press, New York.
Chen, L.P. (2019). Pseudo likelihood estimation for the additive hazards model with data subject to lefttruncation and rightcensoring. Statistics and Its Interface, 12, 135148.
Chen, X., Chen, X. and Wang, H. (2018) Robust feature screening for ultrahigh dimensional right censored data via distance correlation. Computational Statistics and Data Analysis, 119, 118138.
Chen, X., Zhang, Y., Chen, X. and Liu, Y. (2019) A simple modelfree survival conditional feature screening. Statistics and Probability Letters, 146, 156160.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004) Least angle regression. The Annals of Statistics, 32, 409  499.
Fan, J. and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 13481360.
Fan, J. and Lv, J. (2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). Journal of the Royal Statistical Society. Series B, 70, 849  911.
Fan, J., Samworth, R. and Wu, Y. (2009) Ultrahigh dimensional feature selection: beyond the linear model.
Journal of Machine Learning Research
,, 10, 1829  1853.Fan, J. and Song, R. (2010) Sure independence screening in generalized linear models with NPdimensionality. The Annals of Statistics, 38, 3567  3604.
Fan, J., Feng, Y. and Wu, Y. (2010) Ultrahigh dimensional variable selection for Cox’s proportional hazards model. IMS Collect, 6, 70  86.
Hall, P. and Miller, H. (2009) Using generalized correlation to effect variable selection in very high dimensional problems. Journal of Computational and Graphical Statistics, 18, 533  550.
Li, R., Zhong, W. & Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129  1139.
Miller, R. G. (1981). Survival Analysis. Wiley, New York.
Rosenwald, A.,Wright, G., Chan, W. C., Connors, J. M., Campo, E., Fisher, R. I., Gascoyne, R. D., MullerHermelink, H. K., Smeland, E. B., and Staudt, L. M. (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse largeBbell lymphoma. The New England Journal of Medicine, 346, 19371947.
Schwarz, G. (1978) Estimating the dimension of model. Annals of Statistics, 6, 461  464.
Székely, G. J., Rizzo, M. L. & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35, 27692794.
Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58, 267288.
van de Vijver, M. J., He, Y. D., van’t Veer, L. J., Dai, H., Hart, A. A.M., Voskuil, D. W., Schreiber, G.J., Peterse, J.L., Roberts, C., Marton, M.J., Parrish, M., Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T., Bartelink, H., Rodenhuis, S., Rutgers, E.T., Friend, S.H. and Bernards, R. (2002) A geneexpression signature as a predictor of survival in breast cancer. The New England Journal of Medicine, 347, 1999  2009.
Yan, X., Tang, N. and Zhao, X. (2017) The Spearman rank correlation screening for ultrahigh dimensional censored data. arXiv:1702.02708v1
Zhong, W. and Zhu, L. (2015) An iterative approach to distance correlationbased sure independence screening. Journal of Statistical Computation and Simulation, 85, 2331  2345.
Zhu, L., Li, L., Li, R. and Zhu, L. (2011). Modelfree feature screening for ultrahighdimensional data. Journal of the American Statistical Association, 106, 1464  1475.
Zou, H. and Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of Royal Statistical Society: Series B, 67, 301320.
Zou, H. (2006) The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 14181429.
Feature screening  Iterated feature screening  
Model  Method  
PH  0.15  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  1.000  1.000  
0.50  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.998  0.998  
0.75  Naive  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  1.000  1.000  
0.15  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
0.50  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
0.75  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.997  0.997  
PO  0.15  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
0.50  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
0.75  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  1.000  1.000  
0.15  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
0.50  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
0.75  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
1.000  1.000  1.000  0.002  0.002  1.000  1.000  1.000  0.997  0.997 
Feature screening  Iterated feature screening  
Model  Method  
PH  0.15  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.004  0.004  
Propose  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  1.000  1.000  
0.50  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.005  0.005  
Propose  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.997  0.997  
0.75  Naive  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.004  0.004  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.997  0.997  
1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  1.000  1.000  
0.15  Naive  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.005  0.005  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.995  0.995  
0.50  Naive  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.002  0.002  
Propose  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.994  0.994  
0.75  Naive  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
1.000  1.000  1.000  0.006  0.006  1.000  1.000  1.000  0.998  0.998  
PO  0.15  Naive  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.004  0.004  
Propose  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.997  0.997  
0.50  Naive  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.004  0.004  
Propose  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.995  0.995  
0.75  Naive  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.004  0.004  
Propose  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.995  0.995  
1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  1.000  1.000  
0.15  Naive  1.000  1.000  1.000  0.002  0.002  1.000  1.000  1.000  0.003  0.003  
Propose  1.000  1.000  1.000  0.002  0.002  1.000  1.000  1.000  0.997  0.997  
0.50  Naive  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.995  0.995  
0.75  Naive  1.000  1.000  1.000  0.002  0.002  1.000  1.000  1.000  0.003  0.003  
Propose  1.000  1.000  1.000  0.002  0.002  1.000  1.000  1.000  0.995  0.995  
1.000  1.000  1.000  0.002  0.002  1.000  1.000  1.000  0.997  0.997 
Feature screening  Iterated feature screening  
Model  Method  
PH  0.15  Naive  1.000  1.000  1.000  0.007  0.007  1.000  1.000  1.000  0.007  0.007  
Propose  1.000  1.000  1.000  0.007  0.007  1.000  1.000  1.000  1.000  1.000  
0.50  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.005  0.005  
Propose  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.997  0.997  
0.75  Naive  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.004  0.003  
Propose  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.995  0.995  
1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  1.000  1.000  
0.15  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.005  0.005  
Propose  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.997  0.997  
0.50  Naive  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.004  0.004  
Propose  1.000  1.000  1.000  0.004  0.004  1.000  1.000  1.000  0.996  0.996  
0.75  Naive  1.000  1.000  1.000  0.001  0.001  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.001  0.001  1.000  1.000  1.000  0.994  0.994  
1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.998  0.998  
PO  0.15  Naive  1.000  1.000  1.000  0.008  0.008  1.000  1.000  1.000  0.009  0.009  
Propose  1.000  1.000  1.000  0.008  0.008  1.000  1.000  1.000  0.998  0.998  
0.50  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.005  0.005  
Propose  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.997  0.997  
0.75  Naive  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.006  0.006  
Propose  1.000  1.000  1.000  0.005  0.005  1.000  1.000  1.000  0.996  0.996  
1.000  1.000  1.000  0.006  0.006  1.000  1.000  1.000  1.000  1.000  
0.15  Naive  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.004  0.004  
Propose  1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.995  0.995  
0.50  Naive  1.000  1.000  1.000  0.002  0.002  1.000  1.000  1.000  0.004  0.004  
Propose  1.000  1.000  1.000  0.002  0.002  1.000  1.000  1.000  0.995  0.995  
0.75  Naive  1.000  1.000  1.000  0.000  0.000  1.000  1.000  1.000  0.003  0.003  
Propose  1.000  1.000  1.000  0.000  0.000  1.000  1.000  1.000  0.994  0.994  
1.000  1.000  1.000  0.003  0.003  1.000  1.000  1.000  0.998  0.998 
#  naive  

FS  IFS  FS  IFS  FS  IFS  FS  IFS  
1  16587  16587  16587  16587  16587  16587  16587  16587 
2  24719  24719  24719  24719  24719  24719  24719  24719 
3  27057  27057  27057  27057  27057  27057  27057  27057 
4  28581  28581  28581  28581  28581  28581  28581  28581 
5  31420  31420  31420  31420  31420  31420  31420  31420 
6  34790  34790  34790  34790  34790  34790  34790  34790 
7  28581  28581  28581  28581  28581  28581  28581  28581 
8  16312  29357  16312  29357  16312  29357  30157  30157 
9  34771  29897  26537  29897  17053  29897  27116  28872 
10  28346  30620  29637  30620  30917  30620  30334  32699 
11  26521  30898  16587  30898  30929  30898  27762  27095 
12  34375  32699  17053  32699  31972  32699  17326  24710 
13  29642  15843  28346  15843  29637  15844  27019  19325 
14  26537  15924  28908  15924  17605  15924  27762  30282 
15  17605  27927  32519  27927  28346  27931  17176  32187 
16  28920  28929  26521  28929  34771  28929  23887  29209 
17  29657  34339  34364  34375  28908  34375  17343  16528 
18  32519  34913  34667  32475  34651  32475  32699  27019 
19  34651  26510  34771  26510  16079  26510  30157  23887 
20  28908  27530  27931  34913  26537  27530  17917  16020 
#  naive  

FS  IFS  FS  IFS  FS  IFS  FS  IFS  
1  NM 016359  NM 016359  NM 016359  NM 016359  NM 016359  NM 016359  NM 016359  NM 016359 
2  AA555029 RC  AA555029 RC  AA555029 RC  AA555029 RC  AA555029 RC  AA555029 RC  AA555029 RC  AA555029 RC 
3  NM 003748  NM 003748  NM 003748  NM 003748  NM 003748  NM 003748  NM 003748  NM 003748 
4  Contig38288 RC  Contig38288 RC  Contig38288 RC  Contig38288 RC  Contig38288 RC  Contig38288 RC  Contig38288 RC  Contig38288 RC 
5  NM 003862  NM 003862  NM 003862  NM 003862  NM 003862  NM 003862  NM 003862  NM 003862 
6  Contig28552 RC  Contig28552 RC  Contig28552 RC  Contig28552 RC  Contig28552 RC  Contig28552 RC  Contig28552 RC  Contig28552 RC 
7  Contig32125 RC  Contig32125 RC  Contig32125 RC  Contig32125 RC  Contig32125 RC  Contig32125 RC  Contig32125 RC  Contig32125 RC 
8  AB037863  Contig036649 RC  Contig55725 RC  Contig036649 RC  Contig55725 RC  Contig036649 RC  NM 000599  NM 000599 
9  Contig036649 RC  Contig46218 RC  AF201905  Contig46218 RC  AB037863  Contig46218 RC  Contig46223  NM 005915 
10  X05610  AB037863  AB037863  AB037863  AF201905  AB037863  AF257175  Contig46223 
11  AL080079  NM 020188  Contig48328 RC  NM 020188  Contig036649 RC  NM 020188  NM 006931  X05610 
12  NM 006931  Contig55377 RC  Contig036649 RC  Contig25991  X05610  Contig25991  AK000745  AK000745 
13  AF201905  Contig48328 RC  AL080079  Contig55377 RC  NM 018354  Contig48328 RC  NM 005915  NM 005915 
14  NM 003875  Contig25991  X05610  Contig46223 RC  AL080079  Contig55377 RC  NM 001282  NM 001282 
15  Contig55725 RC  NM 003875  Contig55725 RC  NM 003875  Contig55725 RC  NM 003875  AL080079  NM 614321 
16  Contig48328 RC  NM 006101  NM 018354  NM 006101  NM 006931  NM 006101  NM 014889  AF257175 
17  NM 000599  NM 003882  NM 003875  NM 003607  Contig48328 RC  NM 000849  Contig55725 RC  NM 014889 
18  NM 018354  NM 016577  NM 006931  NM 003882  NM 003875  NM 016577  NM 614321 
Comments
There are no comments yet.