Estimation of large block covariance matrices: Application to the analysis of gene expression data
Motivated by an application in molecular biology, we propose a novel, efficient and fully data-driven approach for estimating large block structured sparse covariance matrices in the case where the number of variables is much larger than the number of samples without limiting ourselves to block diagonal matrices. Our approach consists in approximating such a covariance matrix by the sum of a low-rank sparse matrix and a diagonal matrix. Our methodology can also deal with matrices for which the block structure only appears if the columns and rows are permuted according to an unknown permutation. Our technique is implemented in the R package BlockCov which is available from the Comprehensive R Archive Network and from GitHub. In order to illustrate the statistical and numerical performance of our package some numerical experiments are provided as well as a thorough comparison with alternative methods. Finally, our approach is applied to gene expression data in order to better understand the toxicity of acetaminophen on the liver of rats.
READ FULL TEXT