A Hybrid Ensemble Feature Selection Design for Candidate Biomarkers Discovery from Transcriptome Profiles

07/31/2021
by   Felipe Colombelli, et al.
16

The discovery of disease biomarkers from gene expression data has been greatly advanced by feature selection (FS) methods, especially using ensemble FS (EFS) strategies with perturbation at the data level (i.e., homogeneous, Hom-EFS) or method level (i.e., heterogeneous, Het-EFS). Here we proposed a Hybrid EFS (Hyb-EFS) design that explores both types of perturbation to improve the stability and the predictive power of candidate biomarkers. With this, Hyb-EFS aims to disrupt associations of good performance with a single dataset, single algorithm, or a specific combination of both, which is particularly interesting for better reproducibility of genomic biomarkers. We investigated the adequacy of our approach for microarray data related to four types of cancer, carrying out an extensive comparison with other ensemble and single FS approaches. Five FS methods were used in our experiments: Wx, Symmetrical Uncertainty (SU), Gain Ratio (GR), Characteristic Direction (GeoDE), and ReliefF. We observed that the Hyb-EFS and Het-EFS approaches attenuated the large performance variation observed for most single FS and Hom-EFS across distinct datasets. Also, the Hyb-EFS improved upon the stability of the Het-EFS within our domain. Comparing the Hyb-EFS and Het-EFS composed of the top-performing selectors (Wx, GR, and SU), our hybrid approach surpassed the equivalent heterogeneous design and the best Hom-EFS (Hom-Wx). Interestingly, the rankings produced by our Hyb-EFS reached greater biological plausibility, with a notably high enrichment for cancer-related genes and pathways. Thus, our experiments suggest the potential of the proposed Hybrid EFS design in discovering candidate biomarkers from microarray data. Finally, we provide an open-source framework to support similar analyses in other domains, both as a user-friendly application and a plain Python package.

READ FULL TEXT

page 10

page 13

page 14

page 18

research
04/28/2020

Analysis of ensemble feature selection for correlated high-dimensional RNA-Seq cancer data

Discovery of diagnostic and prognostic molecular markers is important an...
research
11/19/2018

EFSIS: Ensemble Feature Selection Integrating Stability

Ensemble learning that can be used to combine the predictions from multi...
research
01/26/2011

The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

Motivation: Biomarker discovery from high-dimensional data is a crucial ...
research
05/20/2019

A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-treated Atlantic Cod (Gadus morhua) Liver

Univariate and multivariate feature selection methods can be used for bi...
research
05/06/2012

TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

Inferring the structure of gene regulatory networks (GRN) from gene expr...
research
03/16/2016

Feature Selection as a Multiagent Coordination Problem

Datasets with hundreds to tens of thousands features is the new norm. Fe...
research
08/28/2019

Comparing Perturbation Models for Evaluating Stability of Post-Processing Pipelines in Neuroimaging

A lack of software reproducibility has become increasingly apparent in t...

Please sign up or login with your details

Forgot password? Click here to reset