Fast Randomized PCA for Sparse Data

10/16/2018
by   Xu Feng, et al.
0

Principal component analysis (PCA) is widely used for dimension reduction and embedding of real data in social network analysis, information retrieval, and natural language processing, etc. In this work we propose a fast randomized PCA algorithm for processing large sparse data. The algorithm has similar accuracy to the basic randomized SVD (rPCA) algorithm (Halko et al., 2011), but is largely optimized for sparse data. It also has good flexibility to trade off runtime against accuracy for practical usage. Experiments on real data show that the proposed algorithm is up to 9.1X faster than the basic rPCA algorithm without accuracy loss, and is up to 20X faster than the svds in Matlab with little error. The algorithm computes the first 100 principal components of a large information retrieval data with 12,869,521 persons and 323,899 keywords in less than 400 seconds on a 24-core machine, while all conventional methods fail due to the out-of-memory issue.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2021

FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Principal Component Analysis (PCA) is a fundamental data preprocessing t...
research
02/14/2023

On the Multiway Principal Component Analysis

Multiway data are becoming more and more common. While there are many ap...
research
12/12/2015

Sparse Generalized Principal Component Analysis for Large-scale Applications beyond Gaussianity

Principal Component Analysis (PCA) is a dimension reduction technique. I...
research
02/01/2014

Randomized Nonlinear Component Analysis

Classical methods such as Principal Component Analysis (PCA) and Canonic...
research
12/21/2013

Large-Scale Paralleled Sparse Principal Component Analysis

Principal component analysis (PCA) is a statistical technique commonly u...
research
09/04/2020

Efficient Model-Based Collaborative Filtering with Fast Adaptive PCA

A model-based collaborative filtering (CF) approach utilizing fast adapt...
research
12/10/2022

Information retrieval in single cell chromatin analysis using TF-IDF transformation methods

Single-cell sequencing assay for transposase-accessible chromatin (scATA...

Please sign up or login with your details

Forgot password? Click here to reset