A User-Guided Bayesian Framework for Ensemble Feature Selection in Life Science Applications (UBayFS)

04/30/2021
by   Anna Jenul, et al.
0

Training machine learning models on high-dimensional datasets is a challenging task and requires measures to prevent overfitting and to keep model complexity low. Feature selection, which represents such a measure, plays a key role in data preprocessing and may provide insights into the systematic variation in the data. The latter aspect is crucial in domains that rely on model interpretability, such as life sciences. We propose UBayFS, an ensemble feature selection technique, embedded in a Bayesian statistical framework. Our approach considers two sources of information: data and domain knowledge. We build an ensemble of elementary feature selectors that extract information from empirical data and aggregate this information to form a meta-model, which compensates for inconsistencies between elementary feature selectors. The user guides UBayFS by weighting features and penalizing specific feature blocks or combinations. The framework builds on a multinomial likelihood and a novel version of constrained Dirichlet-type prior distribution, involving initial feature weights and side constraints. In a quantitative evaluation, we demonstrate that the presented framework allows for a balanced trade-off between user knowledge and data observations. A comparison with standard feature selectors underlines that UBayFS achieves competitive performance, while providing additional flexibility to incorporate domain knowledge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2023

Cross Feature Selection to Eliminate Spurious Interactions and Single Feature Dominance Explainable Boosting Machines

Interpretability is a crucial aspect of machine learning models that ena...
research
10/16/2020

Feature Selection for Huge Data via Minipatch Learning

Feature selection often leads to increased model interpretability, faste...
research
08/02/2019

FeatureExplorer: Interactive Feature Selection and Exploration of Regression Models for Hyperspectral Images

Feature selection is used in machine learning to improve predictions, de...
research
01/26/2011

The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

Motivation: Biomarker discovery from high-dimensional data is a crucial ...
research
04/15/2021

Ontology-based Feature Selection: A Survey

The SemanticWeb emerged as an extension to traditionalWeb, towards addin...
research
06/08/2020

Ensemble-based Feature Selection and Classification Model for DNS Typo-squatting Detection

Domain Name System (DNS) plays in important role in the current IP-based...

Please sign up or login with your details

Forgot password? Click here to reset