Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression

07/12/2019
by   Lei Ding, et al.
0

Motivation: The discovery of relationships between gene expression measurements and phenotypic responses is hampered by both computational and statistical impediments. Conventional statistical methods are less than ideal because they either fail to select relevant genes, predict poorly, ignore the unknown interaction structure between genes, or are computationally intractable. Thus, the creation of new methods which can handle many expression measurements on relatively small numbers of patients while also uncovering gene-gene relationships and predicting well is desirable. Results: We develop a new technique for using the marginal relationship between gene expression measurements and patient survival outcomes to identify a small subset of genes which appear highly relevant for predicting survival, produce a low-dimensional embedding based on this small subset, and amplify this embedding with information from the remaining genes. We motivate our methodology by using gene expression measurements to predict survival time for patients with diffuse large B-cell lymphoma, illustrate the behavior of our methodology on carefully constructed synthetic examples, and test it on a number of other gene expression datasets. Our technique is computationally tractable, generally outperforms other methods, is extensible to other phenotypes, and also identifies different genes (relative to existing methods) for possible future study. Key words: regression; principal components; matrix sketching; preconditioning Availability: All of the code and data are available at https://github.com/dajmcdon/aimer/.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2018

Gaussian process regression for survival time prediction with genome-wide gene expression

Predicting the survival time of a cancer patient based on his/her genome...
research
05/19/2022

Spatial Transcriptomics Dimensionality Reduction using Wavelet Bases

Spatially resolved transcriptomics (ST) measures gene expression along w...
research
08/29/2022

Attention-based Interpretable Regression of Gene Expression in Histology

Interpretability of deep learning is widely used to evaluate the reliabi...
research
11/22/2020

Using ontology embeddings for structural inductive bias in gene expression data analysis

Stratifying cancer patients based on their gene expression levels allows...
research
07/28/2014

Dependence versus Conditional Dependence in Local Causal Discovery from Gene Expression Data

Motivation: Algorithms that discover variables which are causally relate...
research
05/29/2020

CLARITY – Comparing heterogeneous data using dissimiLARITY

Integrating datasets from different disciplines is hard because the data...
research
10/25/2022

Predicting Survival Outcomes in the Presence of Unlabeled Data

Many clinical studies require the follow-up of patients over time. This ...

Please sign up or login with your details

Forgot password? Click here to reset