A Linearly Convergent Algorithm for Distributed Principal Component Analysis

01/05/2021
by   Arpita Gang, et al.
0

Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. This paper focuses on this dual objective of PCA, namely, dimensionality reduction and decorrelation of features, which requires estimating the eigenvectors of a data covariance matrix, as opposed to only estimating the subspace spanned by the eigenvectors. The ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the use of centralized PCA algorithms. Although a few distributed solutions to the PCA problem have been proposed recently, convergence guarantees and/or communications overhead of these solutions remain a concern. With an eye towards communications efficiency, this paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA) that estimates the eigenvectors of a data covariance matrix when data are distributed across an undirected and arbitrarily connected network of machines. Furthermore, the proposed algorithm is shown to converge linearly to a neighborhood of the true solution. Numerical results are also shown to demonstrate the efficacy of the proposed solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2021

FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Principal Component Analysis (PCA) is a fundamental data preprocessing t...
research
03/11/2021

Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation

Principal Subspace Analysis (PSA) is one of the most popular approaches ...
research
02/17/2017

Maximally Correlated Principal Component Analysis

In the era of big data, reducing data dimensionality is critical in many...
research
09/05/2020

Communication-efficient distributed eigenspace estimation

Distributed computing is a standard way to scale up machine learning and...
research
10/27/2017

Quantifying the Estimation Error of Principal Components

Principal component analysis is an important pattern recognition and dim...
research
06/08/2020

Schrödinger PCA: You Only Need Variances for Eigenmodes

Principal component analysis (PCA) has achieved great success in unsuper...
research
02/11/2018

Multi-set Canonical Correlation Analysis simply explained

There are a multitude of methods to perform multi-set correlated compone...

Please sign up or login with your details

Forgot password? Click here to reset