Exact and Approximation Algorithms for Sparse PCA

08/28/2020
by   Yongchun Li, et al.
0

Sparse PCA (SPCA) is a fundamental model in machine learning and data analytics, which has witnessed a variety of application areas such as finance, manufacturing, biology, healthcare. To select a prespecified-size principal submatrix from a covariance matrix to maximize its largest eigenvalue for the better interpretability purpose, SPCA advances the conventional PCA with both feature selection and dimensionality reduction. This paper proposes two exact mixed-integer SDPs (MISDPs) by exploiting the spectral decomposition of the covariance matrix and the properties of the largest eigenvalues. We then analyze the theoretical optimality gaps of their continuous relaxation values and prove that they are stronger than that of the state-of-art one. We further show that the continuous relaxations of two MISDPs can be recast as saddle point problems without involving semi-definite cones, and thus can be effectively solved by first-order methods such as the subgradient method. Since off-the-shelf solvers, in general, have difficulty in solving MISDPs, we approximate SPCA with arbitrary accuracy by a mixed-integer linear program (MILP) of a similar size as MISDPs. To be more scalable, we also analyze greedy and local search algorithms, prove their first-known approximation ratios, and show that the approximation ratios are tight. Our numerical study demonstrates that the continuous relaxation values of the proposed MISDPs are quite close to optimality, the proposed MILP model can solve small and medium-size instances to optimality, and the approximation algorithms work very well for all the instances. Finally, we extend the analyses to Rank-one Sparse SVD (R1-SSVD) with non-symmetric matrices and Sparse Fair PCA (SFPCA) when there are multiple covariance matrices, each corresponding to a protected group.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2020

Best Principal Submatrix Selection for the Maximum Entropy Sampling Problem: Scalable Algorithms and Performance Guarantees

This paper studies a classic maximum entropy sampling problem (MESP), wh...
research
09/23/2021

Sparse PCA: A New Scalable Estimator Based On Integer Programming

We consider the Sparse Principal Component Analysis (SPCA) problem under...
research
05/11/2020

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Sparse principal component analysis (PCA) is a popular dimensionality re...
research
01/31/2018

De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices

Sparse principal component analysis (sPCA) has become one of the most wi...
research
07/18/2023

Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives

We consider the problem of learning a sparse graph underlying an undirec...
research
06/11/2018

The CCP Selector: Scalable Algorithms for Sparse Ridge Regression from Chance-Constrained Programming

Sparse regression and variable selection for large-scale data have been ...
research
09/29/2022

Sparse PCA With Multiple Components

Sparse Principal Component Analysis is a cardinal technique for obtainin...

Please sign up or login with your details

Forgot password? Click here to reset