Memory Limited, Streaming PCA

06/28/2013
by   Ioannis Mitliagkas, et al.
0

We consider streaming, one-pass principal component analysis (PCA), in the high-dimensional regime, with limited memory. Here, p-dimensional samples are presented sequentially, and the goal is to produce the k-dimensional subspace that best approximates these points. Standard algorithms require O(p^2) memory; meanwhile no algorithm can do better than O(kp) memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the spiked covariance model, where p-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, n, scales proportionally with the dimension, p. Yet, all algorithms that provably achieve this, have memory complexity O(p^2). Meanwhile, algorithms with memory-complexity O(kp) do not have provable bounds on sample complexity comparable to p. We present an algorithm that achieves both: it uses O(kp) memory (meaning storage of any kind) and is able to compute the k-dimensional spike with O(p p) sample-complexity -- the first algorithm of its kind. While our theoretical analysis focuses on the spiked covariance model, our simulations show that our algorithm is successful on much more general models for the data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2018

History PCA: A New Algorithm for Streaming PCA

In this paper we propose a new algorithm for streaming principal compone...
research
08/02/2018

Streaming Kernel PCA with Õ(√(n)) Random Features

We study the statistical and computational aspects of kernel principal c...
research
05/21/2015

Randomized Robust Subspace Recovery for High Dimensional Data Matrices

This paper explores and analyzes two randomized designs for robust Princ...
research
07/18/2019

Federated PCA with Adaptive Rank Estimation

In many online machine learning and data science tasks such as data summ...
research
08/18/2022

Meta Sparse Principal Component Analysis

We study the meta-learning for support (i.e. the set of non-zero entries...
research
09/07/2023

Gradient-Based Feature Learning under Structured Data

Recent works have demonstrated that the sample complexity of gradient-ba...
research
06/11/2019

Communication and Memory Efficient Testing of Discrete Distributions

We study distribution testing with communication and memory constraints ...

Please sign up or login with your details

Forgot password? Click here to reset