Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis

01/04/2020
by   Haroon Raja, et al.
2

This paper considers the problem of estimating the principal eigenvector of a covariance matrix from independent and identically distributed data samples in streaming settings. The streaming rate of data in many contemporary applications can be high enough that a single processor cannot finish an iteration of existing methods for eigenvector estimation before a new sample arrives. This paper formulates and analyzes a distributed variant of the classical Krasulina's method (D-Krasulina) that can keep up with the high streaming rate of data by distributing the computational load across multiple processing nodes. The analysis shows that—under appropriate conditions—D-Krasulina converges to the principal eigenvector in an order-wise optimal manner; i.e., after receiving M samples across all nodes, its estimation error can be O(1/M). In order to reduce the network communication overhead, the paper also develops and analyzes a mini-batch extension of D-Krasulina, which is termed DM-Krasulina. The analysis of DM-Krasulina shows that it can also achieve order-optimal estimation error rates under appropriate conditions, even when some samples have to be discarded within the network due to communication latency. Finally, experiments are performed over synthetic and real-world data to validate the convergence behaviors of D-Krasulina and DM-Krasulina in high-rate streaming settings.

READ FULL TEXT
research
05/18/2020

Scaling-up Distributed Processing of Data Streams for Machine Learning

Emerging applications of machine learning in numerous areas involve cont...
research
02/15/2018

History PCA: A New Algorithm for Streaming PCA

In this paper we propose a new algorithm for streaming principal compone...
research
04/25/2020

Memory-efficient training with streaming dimensionality reduction

The movement of large quantities of data during the training of a Deep N...
research
05/28/2019

AdaOja: Adaptive Learning Rates for Streaming PCA

Oja's algorithm has been the cornerstone of streaming methods in Princip...
research
10/27/2021

Distributed Principal Component Analysis with Limited Communication

We study efficient distributed algorithms for the fundamental problem of...
research
03/11/2021

Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation

Principal Subspace Analysis (PSA) is one of the most popular approaches ...
research
08/29/2018

Diffusion Approximations for Online Principal Component Estimation and Global Convergence

In this paper, we propose to adopt the diffusion approximation tools to ...

Please sign up or login with your details

Forgot password? Click here to reset