Integrative Sparse Partial Least Squares
Partial least squares, as a dimension reduction method, has become increasingly important for its ability to deal with problems with a large number of variables. Since noisy variables may weaken the performance of the model, the sparse partial least squares (SPLS) technique has been proposed to identify important variables and generate more interpretable results. However, the small sample size of a single dataset limits the performance of conventional methods. An effective solution comes from gathering information from multiple comparable studies. The integrative analysis holds an important status among multi-datasets analyses. The main idea is to improve estimation results by assembling raw datasets and analyzing them jointly. In this paper, we develop an integrative SPLS (iSPLS) method using penalization based on the SPLS technique. The proposed approach consists of two penalties. The first penalty conducts variable selection under the context of integrative analysis; The second penalty, a contrasted one, is imposed to encourage the similarity of estimates across datasets and generate more reasonable and accurate results. Computational algorithms are provided. Simulation experiments are conducted to compare iSPLS with alternative approaches. The practical utility of iSPLS is shown in the analysis of two TCGA gene expression data.
READ FULL TEXT