Rank-based Bayesian variable selection for genome-wide transcriptomic analyses

07/11/2021
by   Emilie Eliseussen, et al.
0

Variable selection is crucial in high-dimensional omics-based analyses, since it is biologically reasonable to assume only a subset of non-noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is still a very frequent approach, despite the evident drawbacks and lack of reproducibility. We propose a Bayesian variable selection approach for rank-based transcriptomic analysis. Making use of data rankings instead of the actual continuous measurements increases the robustness of conclusions when compared to classical statistical methods, and embedding variable selection into the inferential tasks allows complete reproducibility. Specifically, we develop a novel extension of the Bayesian Mallows model for variable selection that allows for a full probabilistic analysis, leading to coherent quantification of uncertainties. We test our approach on simulated data using several data generating procedures, demonstrating the versatility and robustness of the method under different scenarios. We then use the novel approach to analyse genome-wide RNAseq gene expression data from ovarian cancer samples: several genes that affect cancer development are correctly detected in a completely unsupervised fashion, showing the method usefulness in the context of signature discovery for cancer genomics. Moreover, the possibility to also perform uncertainty quantification plays a key role in the subsequent biological investigation.

READ FULL TEXT

page 10

page 12

page 15

page 24

page 25

page 26

page 27

page 28

research
03/05/2020

Exploiting disagreement between high-dimensional variable selectors for uncertainty visualization

We propose Combined Selection and Uncertainty Visualizer (CSUV), which e...
research
12/08/2017

Bayesian Variable Selection For Survival Data Using Inverse Moment Priors

Efficient variable selection in high dimensional cancer genomic studies ...
research
04/20/2021

Bayesian subset selection and variable importance for interpretable prediction and classification

Subset selection is a valuable tool for interpretable learning, scientif...
research
09/19/2021

Uncertainty quantification for robust variable selection and multiple testing

We study the problem of identifying the set of active variables, termed ...
research
04/24/2020

Integrative Bayesian models using Post-selective Inference: a case study in Radiogenomics

Identifying direct links between gene pathways and clinical endpoints fo...
research
09/29/2021

Non-stationary Gaussian process discriminant analysis with variable selection for high-dimensional functional data

High-dimensional classification and feature selection tasks are ubiquito...

Please sign up or login with your details

Forgot password? Click here to reset