The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

01/26/2011
by   Anne-Claire Haury, et al.
0

Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. Methods: We compare 32 feature selection methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter methods generally outperform more complex embedded or wrapper methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/ ahaury/.

READ FULL TEXT

page 7

page 12

research
01/18/2010

Increasing stability and interpretability of gene expression signatures

Motivation : Molecular signatures for diagnosis or prognosis estimated f...
research
05/06/2012

TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

Inferring the structure of gene regulatory networks (GRN) from gene expr...
research
11/10/2016

Feature Selection with the R Package MXM: Discovering Statistically-Equivalent Feature Subsets

The statistically equivalent signature (SES) algorithm is a method for f...
research
07/31/2021

A Hybrid Ensemble Feature Selection Design for Candidate Biomarkers Discovery from Transcriptome Profiles

The discovery of disease biomarkers from gene expression data has been g...
research
07/04/2018

Diagonal Discriminant Analysis with Feature Selection for High Dimensional Data

We introduce a new method of performing high dimensional discriminant an...
research
12/13/2012

Integrating Prior Knowledge Into Prognostic Biomarker Discovery based on Network Structure

Background: Predictive, stable and interpretable gene signatures are gen...
research
04/30/2021

A User-Guided Bayesian Framework for Ensemble Feature Selection in Life Science Applications (UBayFS)

Training machine learning models on high-dimensional datasets is a chall...

Please sign up or login with your details

Forgot password? Click here to reset