Automatic sparse PCA for high-dimensional data

09/29/2022
by   Kazuyoshi Yata, et al.
0

Sparse principal component analysis (SPCA) methods have proven to efficiently analyze high-dimensional data. Among them, threshold-based SPCA (TSPCA) is computationally more cost-effective as compared to regularized SPCA, based on L1 penalties. Here, we investigate the efficacy of TSPCA for high-dimensional data settings and illustrate that, for a suitable threshold value, TSPCA achieves satisfactory performance for high-dimensional data. Thus, the performance of the TSPCA depends heavily on the selected threshold value. To this end, we propose a novel thresholding estimator to obtain the principal component (PC) directions using a customized noise-reduction methodology. The proposed technique is consistent under mild conditions, unaffected by threshold values, and therefore yields more accurate results quickly at a lower computational cost. Furthermore, we explore the shrinkage PC directions and their application in clustering high-dimensional data. Finally, we evaluate the performance of the estimated shrinkage PC directions in actual data analyses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2022

Homogeneity and Sub-homogeneity Pursuit: Iterative Complement Clustering PCA

Principal component analysis (PCA), the most popular dimension-reduction...
research
11/19/2019

Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning

Sparse principal component analysis (PCA) is an important technique for ...
research
08/08/2018

Some Statistical Problems with High Dimensional Financial data

For high dimensional data, some of the standard statistical techniques d...
research
01/30/2021

Spike and slab Bayesian sparse principal component analysis

Sparse principal component analysis (PCA) is a popular tool for dimensio...
research
03/05/2012

Subspace clustering of high-dimensional data: a predictive approach

In several application domains, high-dimensional observations are collec...
research
04/20/2019

High Dimensional Process Monitoring Using Robust Sparse Probabilistic Principal Component Analysis

High dimensional data has introduced challenges that are difficult to ad...
research
11/04/2013

TOP-SPIN: TOPic discovery via Sparse Principal component INterference

We propose a novel topic discovery algorithm for unlabeled images based ...

Please sign up or login with your details

Forgot password? Click here to reset