DeepAI AI Chat
Log In Sign Up

Robust covariance estimation for distributed principal component analysis

10/14/2020
by   Kangqiang Li, et al.
0

Principal component analysis (PCA) is a well-known tool for dimension reduction. It can summarise the data in fewer than the original number of dimensions without losing essential information. However, when data are dispersed across multiple servers, communication cost can't make PCA useful in this situation. Thus distributed algorithms for PCA are needed. Fan et al. [Annals of statistics 47(6) (2019) 3009-3031] proposed a distributed PCA algorithm to settle this problem. On each server, They computed the K leading eigenvectors V_K^(ℓ)=(v_1^(ℓ), …, v_K^(ℓ)) ∈ℝ^d × K of the sample covariance matrix Σ and sent V_K^(ℓ) to the data center. In this paper, we introduce robust covariance matrix estimators respectively proposed by Minsker [Annals of statistics 46(6A) (2018) 2871-2903] and Ke et al. [Statistical Science 34(3) (2019) 454-471] into the distributed PCA algorithm and compute its top K eigenvectors on each server and transmit them to the central server. We investigate the statistical error of the resulting distributed estimator and derive the rate of convergence for distributed PCA estimators for symmetric innovation distribution and general distribution. By simulation study, the theoretical results are verified. Also, we extend our analysis to the heterogeneous case with weaker moments where samples on each server and across servers are independent and their population covariance matrices are different but share the same top K eigenvectors.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/24/2022

Distributed Robust Principal Component Analysis

We study the robust principal component analysis (RPCA) problem in a dis...
05/31/2019

Diagonally-Dominant Principal Component Analysis

We consider the problem of decomposing a large covariance matrix into th...
05/17/2019

Online Distributed Estimation of Principal Eigenspaces

Principal components analysis (PCA) is a widely used dimension reduction...
11/05/2018

On the asymptotics of Maronna's robust PCA

The eigenvalue decomposition (EVD) parameters of the second order statis...
11/09/2020

High dimensional PCA: a new model selection criterion

Given a random sample from a multivariate population, estimating the num...
12/20/2018

Steerable ePCA

In photon-limited imaging, the pixel intensities are affected by photon ...
06/22/2018

Removing the Curse of Superefficiency: an Effective Strategy For Distributed Computing in Isotonic Regression

We propose a strategy for computing the isotonic least-squares estimate ...