Analysis of ensemble feature selection for correlated high-dimensional RNA-Seq cancer data

04/28/2020
by   Aneta Polewko-Klim, et al.
0

Discovery of diagnostic and prognostic molecular markers is important and actively pursued the research field in cancer research. For complex diseases, this process is often performed using Machine Learning. The current study compares two approaches for the discovery of relevant variables: by application of a single feature selection algorithm, versus by an ensemble of diverse algorithms. These approaches are used to identify variables that are relevant discerning of four cancer types using RNA-seq profiles from the Cancer Genome Atlas. The comparison is carried out in two directions: evaluating the predictive performance of models and monitoring the stability of selected variables. The most informative features are identified using a four feature selection algorithms, namely U-test, ReliefF, and two variants of the MDFS algorithm. Discerning normal and tumor tissues is performed using the Random Forest algorithm. The highest stability of the feature set was obtained when U-test was used. Unfortunately, models built on feature sets obtained from the ensemble of feature selection algorithms were no better than for models developed on feature sets obtained from individual algorithms. On the other hand, the feature selectors leading to the best classification results varied between data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2011

Random forest models of the retention constants in the thin layer chromatography

In the current study we examine an application of the machine learning m...
research
09/29/2021

A Study of Feature Selection and Extraction Algorithms for Cancer Subtype Prediction

In this work, we study and analyze different feature selection algorithm...
research
07/31/2021

A Hybrid Ensemble Feature Selection Design for Candidate Biomarkers Discovery from Transcriptome Profiles

The discovery of disease biomarkers from gene expression data has been g...
research
07/27/2016

Network-Guided Biomarker Discovery

Identifying measurable genetic indicators (or biomarkers) of a specific ...
research
02/24/2014

Machine Learning Methods in the Computational Biology of Cancer

The objectives of this "perspective" paper are to review some recent adv...
research
05/26/2008

DimReduction - Interactive Graphic Environment for Dimensionality Reduction

Feature selection is a pattern recognition approach to choose important ...
research
07/05/2022

Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery

Healthcare datasets present many challenges to both machine learning and...

Please sign up or login with your details

Forgot password? Click here to reset