Ensemble feature selection with clustering for analysis of high-dimensional, correlated clinical data in the search for Alzheimer's disease biomarkers

07/06/2022
by   Annette Spooner, et al.
0

Healthcare datasets often contain groups of highly correlated features, such as features from the same biological system. When feature selection is applied to these datasets to identify the most important features, the biases inherent in some multivariate feature selectors due to correlated features make it difficult for these methods to distinguish between the important and irrelevant features and the results of the feature selection process can be unstable. Feature selection ensembles, which aggregate the results of multiple individual base feature selectors, have been investigated as a means of stabilising feature selection results, but do not address the problem of correlated features. We present a novel framework to create feature selection ensembles from multivariate feature selectors while taking into account the biases produced by groups of correlated features, using agglomerative hierarchical clustering in a pre-processing step. These methods were applied to two real-world datasets from studies of Alzheimer's disease (AD), a progressive neurodegenerative disease that has no cure and is not yet fully understood. Our results show a marked improvement in the stability of features selected over the models without clustering, and the features selected by these models are in keeping with the findings in the AD literature.

READ FULL TEXT
research
07/05/2022

Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery

Healthcare datasets present many challenges to both machine learning and...
research
07/01/2021

ControlBurn: Feature Selection by Sparse Forests

Tree ensembles distribute feature importance evenly amongst groups of co...
research
06/27/2012

Discovering Support and Affiliated Features from Very High Dimensions

In this paper, a novel learning paradigm is presented to automatically i...
research
03/08/2022

Beam Search for Feature Selection

In this paper, we present and prove some consistency results about the p...
research
10/11/2021

Deep Unsupervised Feature Selection by Discarding Nuisance and Correlated Features

Modern datasets often contain large subsets of correlated features and n...
research
07/21/2020

Outcome-Guided Disease Subtyping for High-Dimensional Omics Data

High-throughput microarray and sequencing technology have been used to i...
research
10/28/2022

End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks

The events of recent years have highlighted the importance of telemedici...

Please sign up or login with your details

Forgot password? Click here to reset