Imbalanced Sparse Canonical Correlation Analysis
Classical canonical correlation analysis (CCA) requires matrices to be low dimensional, i.e. the number of features cannot exceed the sample size. Recent developments in CCA have mainly focused on the high-dimensional setting, where the number of features in both matrices under analysis greatly exceeds the sample size. However, these approaches make considerable sparsity assumptions and impose penalties that may be unnecessary for some datasets. We consider an imbalanced setting that is commonly encountered, where one matrix is high dimensional and the other is low dimensional. We provide an explicit link between sparse multiple regression with sparse canonical correlation analysis, and an efficient algorithm that exploits the imbalanced data structure and estimates multiple canonical pairs rather than sequentially. We provide theoretical results on the consistency of canonical pairs. Simulation results and the analysis of several real datasets support the improved performance of the proposed approach.
READ FULL TEXT