Data Distillery: Effective Dimension Estimation via Penalized Probabilistic PCA

03/20/2018
by   Wei Q. Deng, et al.
0

The paper tackles the unsupervised estimation of the effective dimension of a sample of dependent random vectors. The proposed method uses the principal components (PC) decomposition of sample covariance to establish a low-rank approximation that helps uncover the hidden structure. The number of PCs to be included in the decomposition is determined via a Probabilistic Principal Components Analysis (PPCA) embedded in a penalized profile likelihood criterion. The choice of penalty parameter is guided by a data-driven procedure that is justified via analytical derivations and extensive finite sample simulations. Application of the proposed penalized PPCA is illustrated with three gene expression datasets in which the number of cancer subtypes is estimated from all expression measurements. The analyses point towards hidden structures in the data, e.g. additional subgroups, that could be of scientific interest.

READ FULL TEXT

page 10

page 12

page 14

page 15

page 18

page 19

page 20

research
05/17/2019

Online Distributed Estimation of Principal Eigenspaces

Principal components analysis (PCA) is a widely used dimension reduction...
research
10/08/2018

Find the dimension that counts: Fast dimension estimation and Krylov PCA

High dimensional data and systems with many degrees of freedom are often...
research
10/07/2021

AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow

Principal component analysis (PCA) has been widely used as an effective ...
research
05/11/2016

Tuning parameter selection in high dimensional penalized likelihood

Determining how to appropriately select the tuning parameter is essentia...
research
12/21/2020

Empirical Bayes PCA in high dimensions

When the dimension of data is comparable to or larger than the number of...
research
05/14/2023

Nonlinear regression: finite sample guarantees

This paper offers a new approach for study the frequentist properties of...
research
11/21/2015

Kernel Additive Principal Components

Additive principal components (APCs for short) are a nonlinear generaliz...

Please sign up or login with your details

Forgot password? Click here to reset