Distributed Learning for Principle Eigenspaces without Moment Constraints

04/29/2022
by   Yong He, et al.
0

Distributed Principal Component Analysis (PCA) has been studied to deal with the case when data are stored across multiple machines and communication cost or privacy concerns prohibit the computation of PCA in a central location. However, the sub-Gaussian assumption in the related literature is restrictive in real application where outliers or heavy-tailed data are common in areas such as finance and macroeconomic. In this article, we propose a distributed algorithm for estimating the principle eigenspaces without any moment constraint on the underlying distribution. We study the problem under the elliptical family framework and adopt the sample multivariate Kendall'tau matrix to extract eigenspace estimators from all sub-machines, which can be viewed as points in the Grassman manifold. We then find the "center" of these points as the final distributed estimator of the principal eigenspace. We investigate the bias and variance for the distributed estimator and derive its convergence rate which depends on the effective rank and eigengap of the scatter matrix, and the number of submachines. We show that the distributed estimator performs as if we have full access of whole data. Simulation studies show that the distributed algorithm performs comparably with the existing one for light-tailed data, while showing great advantage for heavy-tailed data. We also extend our algorithm to the distributed learning of elliptical factor models and verify its empirical usefulness through real application to a macroeconomic dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2020

Distributed Estimation for Principal Component Analysis: a Gap-free Approach

The growing size of modern data sets brings many challenges to the exist...
research
07/06/2021

Distributed Adaptive Huber Regression

Distributed data naturally arise in scenarios involving multiple sources...
research
06/08/2022

Robust self-tuning semiparametric PCA for contaminated elliptical distribution

Principal component analysis (PCA) is one of the most popular dimension ...
research
05/06/2020

A Communication-Efficient Distributed Algorithm for Kernel Principal Component Analysis

Principal Component Analysis (PCA) is a fundamental technology in machin...
research
03/06/2023

Huber Principal Component Analysis for Large-dimensional Factor Models

Factor models have been widely used in economics and finance. However, t...
research
09/02/2015

Heavy-tailed Independent Component Analysis

Independent component analysis (ICA) is the problem of efficiently recov...
research
06/20/2022

Statistical Inference for Large-dimensional Tensor Factor Model by Weighted/Unweighted Projection

Tensor Factor Models (TFM) are appealing dimension reduction tools for h...

Please sign up or login with your details

Forgot password? Click here to reset