Separating and reintegrating latent variables to improve classification of genomic data

12/22/2020
by   Yujia Pan, et al.
0

Genomic datasets contain the effects of various unobserved biological variables in addition to the variable of primary interest. These latent variables often affect a large number of features (e.g., genes) and thus give rise to dense latent variation, which presents both challenges and opportunities for classification. Some of these latent variables may be partially correlated with the phenotype of interest and therefore helpful, while others may be uncorrelated and thus merely contribute additional noise. Moreover, whether potentially helpful or not, these latent variables may obscure weaker effects that impact only a small number of features but more directly capture the signal of primary interest. We propose the cross-residualization classifier to better account for the latent variables in genomic data. Through an adjustment and ensemble procedure, the cross-residualization classifier essentially estimates the latent variables and residualizes out their effects, trains a classifier on the residuals, and then re-integrates the the latent variables in a final ensemble classifier. Thus, the latent variables are accounted for without discarding any potentially predictive information that they may contribute. We apply the method to simulated data as well as a variety of genomic datasets from multiple platforms. In general, we find that the cross-residualization classifier performs well relative to existing classifiers and sometimes offers substantial gains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2017

A Review of Dynamic Network Models with Latent Variables

We present a selective review on statistical modeling of dynamic network...
research
10/19/2012

Learning Measurement Models for Unobserved Variables

Observed associations in a database may be due in whole or part to varia...
research
03/03/2022

Interpretable Latent Variables in Deep State Space Models

We introduce a new version of deep state-space models (DSSMs) that combi...
research
02/13/2018

Bayesian model assessment: Use of conditional vs marginal likelihoods

Typical Bayesian methods for models with latent variables (or random eff...
research
07/08/2015

Spotlight the Negatives: A Generalized Discriminative Latent Model

Discriminative latent variable models (LVM) are frequently applied to va...
research
08/18/2020

Clustering of variables for enhanced interpretability of predictive models

A new strategy is proposed for building easy to interpret predictive mod...
research
09/13/2017

A Comparison of Public Causal Search Packages on Linear, Gaussian Data with No Latent Variables

We compare Tetrad (Java) algorithms to the other public software package...

Please sign up or login with your details

Forgot password? Click here to reset