Estimating The Proportion of Signal Variables Under Arbitrary Covariance Dependence
Estimating the proportion of signals hidden in a large amount of noise variables is of interest in many scientific inquires. In this paper, we consider realistic but theoretically challenging settings with arbitrary covariance dependence between variables. We define mean absolute correlation (MAC) to measure the overall dependence level and investigate a family of estimators for their performances in the full range of MAC. We explicit the joint effect of MAC dependence and signal sparsity on the performances of the family of estimators and discover that no single estimator in the family is most powerful under different MAC dependence levels. Informed by the theoretical insight, we propose a new estimator to better adapt to arbitrary covariance dependence. The proposed method compares favorably to several existing methods in extensive finite-sample settings with strong to weak covariance dependence and real dependence structures from genetic association studies.
READ FULL TEXT