A generalised OMP algorithm for feature selection with application to gene expression data

04/01/2020
by   Michail Tsagris, et al.
0

Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of available features. In this paper, we propose gOMP, a highly-scalable generalisation of the Orthogonal Matching Pursuit feature selection algorithm to several directions: (a) different types of outcomes, such as continuous, binary, nominal, and time-to-event, (b) different types of predictive models (e.g., linear least squares, logistic regression), (c) different types of predictive features (continuous, categorical), and (d) different, statistical-based stopping criteria. We compare the proposed algorithm against LASSO, a prototypical, widely used algorithm for high-dimensional data. On dozens of simulated datasets, as well as, real gene expression datasets, gOMP is on par, or outperforms LASSO for case-control binary classification, quantified outcomes (regression), and (censored) survival times (time-to-event) analysis. gOMP has also several theoretical advantages that are discussed. While gOMP is based on quite simple and basic statistical ideas, easy to implement and to generalize, we also show in an extensive evaluation that it is also quite effective in bioinformatics analysis settings.

READ FULL TEXT

page 13

page 14

research
09/29/2021

A Study of Feature Selection and Extraction Algorithms for Cancer Subtype Prediction

In this work, we study and analyze different feature selection algorithm...
research
05/30/2017

Forward-Backward Selection with Early Dropping

Forward-backward selection is one of the most basic and commonly-used fe...
research
09/30/2017

Testing for Feature Relevance: The HARVEST Algorithm

Feature selection with high-dimensional data and a very small proportion...
research
08/20/2022

Should univariate Cox regression be used for feature selection with respect to time-to-event outcomes?

IMPORTANCE: Time-to-event outcomes are commonly used in clinical trials ...
research
03/16/2016

Feature Selection as a Multiagent Coordination Problem

Datasets with hundreds to tens of thousands features is the new norm. Fe...
research
02/09/2022

Explainable Predictive Modeling for Limited Spectral Data

Feature selection of high-dimensional labeled data with limited observat...

Please sign up or login with your details

Forgot password? Click here to reset