Sufficient principal component regression for pattern discovery in transcriptomic data

07/05/2021
by   Lei Ding, et al.
0

Methods for global measurement of transcript abundance such as microarrays and RNA-seq generate datasets in which the number of measured features far exceeds the number of observations. Extracting biologically meaningful and experimentally tractable insights from such data therefore requires high-dimensional prediction. Existing sparse linear approaches to this challenge have been stunningly successful, but some important issues remain. These methods can fail to select the correct features, predict poorly relative to non-sparse alternatives, or ignore any unknown grouping structures for the features. We propose a method called SuffPCR that yields improved predictions in high-dimensional tasks including regression and classification, especially in the typical context of omics with correlated features. SuffPCR first estimates sparse principal components and then estimates a linear model on the recovered subspace. Because the estimated subspace is sparse in the features, the resulting predictions will depend on only a small subset of genes. SuffPCR works well on a variety of simulated and experimental transcriptomic data, performing nearly optimally when the model assumptions are satisfied. We also demonstrate near-optimal theoretical guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

Envelopes and principal component regression

Envelope methods offer targeted dimension reduction for various models. ...
research
05/27/2015

Sufficient Forecasting Using Factor Models

We consider forecasting a single time series when there is a large numbe...
research
12/29/2022

Theoretical Guarantees for Sparse Principal Component Analysis based on the Elastic Net

Sparse principal component analysis (SPCA) is widely used for dimensiona...
research
08/18/2022

Meta Sparse Principal Component Analysis

We study the meta-learning for support (i.e. the set of non-zero entries...
research
03/22/2021

Supervised Principal Component Regression for Functional Response with High Dimensional Predictors

We propose a supervised principal component regression method for relati...
research
12/23/2015

Adaptive Ensemble Learning with Confidence Bounds

Extracting actionable intelligence from distributed, heterogeneous, corr...
research
07/21/2016

Explaining Classification Models Built on High-Dimensional Sparse Data

Predictive modeling applications increasingly use data representing peop...

Please sign up or login with your details

Forgot password? Click here to reset