Functional Principal Subspace Sampling for Large Scale Functional Data Analysis

09/08/2021
by   Shiyuan He, et al.
0

Functional data analysis (FDA) methods have computational and theoretical appeals for some high dimensional data, but lack the scalability to modern large sample datasets. To tackle the challenge, we develop randomized algorithms for two important FDA methods: functional principal component analysis (FPCA) and functional linear regression (FLR) with scalar response. The two methods are connected as they both rely on the accurate estimation of functional principal subspace. The proposed algorithms draw subsamples from the large dataset at hand and apply FPCA or FLR over the subsamples to reduce the computational cost. To effectively preserve subspace information in the subsamples, we propose a functional principal subspace sampling probability, which removes the eigenvalue scale effect inside the functional principal subspace and properly weights the residual. Based on the operator perturbation analysis, we show the proposed probability has precise control over the first order error of the subspace projection operator and can be interpreted as an importance sampling for functional subspace estimation. Moreover, concentration bounds for the proposed algorithms are established to reflect the low intrinsic dimension nature of functional data in an infinite dimensional space. The effectiveness of the proposed algorithms is demonstrated upon synthetic and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2017

Independent component analysis for multivariate functional data

We extend two methods of independent component analysis, fourth order bl...
research
03/22/2021

Supervised Principal Component Regression for Functional Response with High Dimensional Predictors

We propose a supervised principal component regression method for relati...
research
07/12/2022

Envelopes and principal component regression

Envelope methods offer targeted dimension reduction for various models. ...
research
03/11/2021

Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation

Principal Subspace Analysis (PSA) is one of the most popular approaches ...
research
01/03/2022

On randomized sketching algorithms and the Tracy-Widom law

There is an increasing body of work exploring the integration of random ...
research
08/12/2021

Probabilistic methods for approximate archetypal analysis

Archetypal analysis is an unsupervised learning method for exploratory d...
research
07/02/2023

Mode-wise Principal Subspace Pursuit and Matrix Spiked Covariance Model

This paper introduces a novel framework called Mode-wise Principal Subsp...

Please sign up or login with your details

Forgot password? Click here to reset