A novel evaluation methodology for supervised Feature Ranking algorithms

07/09/2022
by   Jeroen G. S. Overschie, et al.
8

Both in the domains of Feature Selection and Interpretable AI, there exists a desire to `rank' features based on their importance. Such feature importance rankings can then be used to either: (1) reduce the dataset size or (2) interpret the Machine Learning model. In the literature, however, such Feature Rankers are not evaluated in a systematic, consistent way. Many papers have a different way of arguing which feature importance ranker works best. This paper fills this gap, by proposing a new evaluation methodology. By making use of synthetic datasets, feature importance scores can be known beforehand, allowing more systematic evaluation. To facilitate large-scale experimentation using the new methodology, a benchmarking framework was built in Python, called fseval. The framework allows running experiments in parallel and distributed over machines on HPC systems. By integrating with an online platform called Weights and Biases, charts can be interactively explored on a live dashboard. The software was released as open-source software, and is published as a package on the PyPi platform. The research concludes by exploring one such large-scale experiment, to find the strengths and weaknesses of the participating algorithms, on many fronts.

READ FULL TEXT

page 38

page 39

page 40

page 41

research
02/01/2019

Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures

There is a great diversity of clustering and community detection algorit...
research
02/12/2018

Towards an Open Science Platform for the Evaluation of Data Fusion

Combining the results of different search engines in order to improve up...
research
05/11/2021

Comparing interpretability and explainability for feature selection

A common approach for feature selection is to examine the variable impor...
research
11/19/2022

Block size estimation for data partitioning in HPC applications using machine learning techniques

The extensive use of HPC infrastructures and frameworks for running data...
research
03/02/2016

LOFS: Library of Online Streaming Feature Selection

As an emerging research direction, online streaming feature selection de...
research
01/16/2018

MORF: A Framework for MOOC Predictive Modeling and Replication At Scale

The MOOC Replication Framework (MORF) is a novel software system for fea...
research
04/07/2021

Hollow-tree Super: a directional and scalable approach for feature importance in boosted tree models

Current limitations in boosted tree modelling prevent the effective scal...

Please sign up or login with your details

Forgot password? Click here to reset