FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data

06/12/2023
by   Shuting Shen, et al.
0

Principal component analysis (PCA) is one of the most popular methods for dimension reduction. In light of the rapidly growing large-scale data in federated ecosystems, the traditional PCA method is often not applicable due to privacy protection considerations and large computational burden. Algorithms were proposed to lower the computational cost, but few can handle both high dimensionality and massive sample size under the distributed setting. In this paper, we propose the FAst DIstributed (FADI) PCA method for federated data when both the dimension d and the sample size n are ultra-large, by simultaneously performing parallel computing along d and distributed computing along n. Specifically, we utilize L parallel copies of p-dimensional fast sketches to divide the computing burden along d and aggregate the results distributively along the split samples. We present FADI under a general framework applicable to multiple statistical problems, and establish comprehensive theoretical results under the general framework. We show that FADI enjoys the same non-asymptotic error rate as the traditional PCA when Lp ≥ d. We also derive inferential results that characterize the asymptotic distribution of FADI, and show a phase-transition phenomenon as Lp increases. We perform extensive simulations to show that FADI substantially outperforms the existing methods in computational efficiency while preserving accuracy, and validate the distributional phase-transition phenomenon through numerical experiments. We apply FADI to the 1000 Genomes data to study the population structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2021

FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Principal Component Analysis (PCA) is a fundamental data preprocessing t...
research
10/14/2020

Robust covariance estimation for distributed principal component analysis

Principal component analysis (PCA) is a well-known tool for dimension re...
research
01/31/2019

Phase Transition in the Recovery of Rank One Matrices Corrupted by Gaussian Noise

In datasets where the number of parameters is fixed and the number of sa...
research
03/31/2023

Scalable and Privacy-Preserving Federated Principal Component Analysis

Principal component analysis (PCA) is an essential algorithm for dimensi...
research
07/28/2016

Asymptotic properties of Principal Component Analysis and shrinkage-bias adjustment under the Generalized Spiked Population model

With the development of high-throughput technologies, principal componen...
research
10/07/2021

AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow

Principal component analysis (PCA) has been widely used as an effective ...
research
09/21/2014

Distributed Robust Learning

We propose a framework for distributed robust statistical learning on b...

Please sign up or login with your details

Forgot password? Click here to reset