History PCA: A New Algorithm for Streaming PCA

02/15/2018
by   Puyudi Yang, et al.
0

In this paper we propose a new algorithm for streaming principal component analysis. With limited memory, small devices cannot store all the samples in the high-dimensional regime. Streaming principal component analysis aims to find the k-dimensional subspace which can explain the most variation of the d-dimensional data points that come into memory sequentially. In order to deal with large d and large N (number of samples), most streaming PCA algorithms update the current model using only the incoming sample and then dump the information right away to save memory. However the information contained in previously streamed data could be useful. Motivated by this idea, we develop a new streaming PCA algorithm called History PCA that achieves this goal. By using O(Bd) memory with B≈ 10 being the block size, our algorithm converges much faster than existing streaming PCA algorithms. By changing the number of inner iterations, the memory usage can be further reduced to O(d) while maintaining a comparable convergence speed. We provide theoretical guarantees for the convergence of our algorithm along with the rate of convergence. We also demonstrate on synthetic and real world data sets that our algorithm compares favorably with other state-of-the-art streaming PCA methods in terms of the convergence speed and performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

AdaOja: Adaptive Learning Rates for Streaming PCA

Oja's algorithm has been the cornerstone of streaming methods in Princip...
research
06/28/2013

Memory Limited, Streaming PCA

We consider streaming, one-pass principal component analysis (PCA), in t...
research
03/08/2023

Streaming Kernel PCA Algorithm With Small Space

Principal Component Analysis (PCA) is a widely used technique in machine...
research
02/08/2019

Non-Stationary Streaming PCA

We consider the problem of streaming principal component analysis (PCA) ...
research
07/18/2019

Federated PCA with Adaptive Rank Estimation

In many online machine learning and data science tasks such as data summ...
research
01/04/2020

Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis

This paper considers the problem of estimating the principal eigenvector...
research
06/12/2018

Streaming PCA and Subspace Tracking: The Missing Data Case

For many modern applications in science and engineering, data are collec...

Please sign up or login with your details

Forgot password? Click here to reset