Principal component analysis for big data

01/05/2018
by   Jianqing Fan, et al.
0

Big data is transforming our world, revolutionizing operations and analytics everywhere, from financial engineering to biomedical sciences. The complexity of big data often makes dimension reduction techniques necessary before conducting statistical inference. Principal component analysis, commonly referred to as PCA, has become an essential tool for multivariate data analysis and unsupervised dimension reduction, the goal of which is to find a lower dimensional subspace that captures most of the variation in the dataset. This article provides an overview of methodological and theoretical developments of PCA over the last decade, with focus on its applications to big data analytics. We first review the mathematical formulation of PCA and its theoretical development from the view point of perturbation analysis. We then briefly discuss the relationship between PCA and factor analysis as well as its applications to large covariance estimation and multiple testing. PCA also finds important applications in many modern machine learning problems, and we focus on community detection, ranking, mixture model and manifold learning in this paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2019

Generalized Principal Component Analysis

Generalized principal component analysis (GLM-PCA) facilitates dimension...
research
08/19/2019

High dimensional statistical inference: theoretical development to data analytics

This article is due to appear in the Handbook of Statistics, Vol. 43, El...
research
01/31/2017

Representation of big data by dimension reduction

Suppose the data consist of a set S of points x_j, 1 ≤ j ≤ J, distribute...
research
11/19/2021

Gaussian Determinantal Processes: a new model for directionality in data

Determinantal point processes (a.k.a. DPPs) have recently become popular...
research
11/23/2022

Kernel PCA for multivariate extremes

We propose kernel PCA as a method for analyzing the dependence structure...
research
05/16/2016

Classification of Big Data with Application to Imaging Genetics

Big data applications, such as medical imaging and genetics, typically g...
research
12/20/2019

Big Data Approaches to Knot Theory: Understanding the Structure of the Jones Polynomial

We examine the structure and dimensionality of the Jones polynomial usin...

Please sign up or login with your details

Forgot password? Click here to reset