Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

11/30/2020
by   Lingjing Jiang, et al.
0

Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high-dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the training data would lead to large changes in the chosen feature subset, then many of the biological features that an algorithm has found are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metric MSE and proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications. We conclude that Stability is a preferred feature selection criterion over MSE because it better quantifies the reproducibility of the feature selection method.

READ FULL TEXT

page 22

page 23

page 29

page 30

research
02/05/2012

Improving feature selection algorithms using normalised feature histograms

The proposed feature selection method builds a histogram of the most sta...
research
10/26/2020

Fast-Ensembles of Minimum Redundancy Feature Selection

Finding relevant subspaces in very high-dimensional data is a challengin...
research
06/15/2021

Employing an Adjusted Stability Measure for Multi-Criteria Model Fitting on Data Sets with Similar Features

Fitting models with high predictive accuracy that include all relevant b...
research
06/26/2020

Stable Feature Selection with Applications to MALDI Imaging Mass Spectrometry Data

This paper discusses an approach, based on the subsampling boostrap and ...
research
07/27/2016

Network-Guided Biomarker Discovery

Identifying measurable genetic indicators (or biomarkers) of a specific ...
research
09/25/2020

Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features

For data sets with similar features, for example highly correlated featu...
research
11/19/2018

EFSIS: Ensemble Feature Selection Integrating Stability

Ensemble learning that can be used to combine the predictions from multi...

Please sign up or login with your details

Forgot password? Click here to reset