Large-Scale Sparse Principal Component Analysis with Application to Text Data

10/26/2012
by   Youwei Zhang, et al.
0

Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the surprising fact that sparse PCA can be easier than PCA in practice, and that it can be reliably applied to very large data sets. This comes from a rigorous feature elimination pre-processing result, coupled with the favorable fact that features in real-life data typically have exponentially decreasing variances, which allows for many features to be eliminated. We introduce a fast block coordinate ascent algorithm with much better computational complexity than the existing first-order ones. We provide experimental results obtained on text corpora involving millions of documents and hundreds of thousands of features. These results illustrate how Sparse PCA can help organize a large corpus of text data in a user-interpretable way, providing an attractive alternative approach to topic models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2014

High Dimensional Semiparametric Scale-Invariant Principal Component Analysis

We propose a new high dimensional semiparametric principal component ana...
research
07/09/2019

All Sparse PCA Models Are Wrong, But Some Are Useful. Part I: Computation of Scores, Residuals and Explained Variance

Sparse Principal Component Analysis (sPCA) is a popular matrix factoriza...
research
03/06/2014

Sparse Principal Component Analysis via Rotation and Truncation

Sparse principal component analysis (sparse PCA) aims at finding a spars...
research
05/11/2020

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Sparse principal component analysis (PCA) is a popular dimensionality re...
research
07/04/2018

A Comparative Study on using Principle Component Analysis with Different Text Classifiers

Text categorization (TC) is the task of automatically organizing a set o...
research
03/03/2013

Sparse PCA through Low-rank Approximations

We introduce a novel algorithm that computes the k-sparse principal comp...
research
08/04/2015

Sparse PCA via Bipartite Matchings

We consider the following multi-component sparse PCA problem: given a se...

Please sign up or login with your details

Forgot password? Click here to reset