Classifier Selection with Permutation Tests

11/27/2017
by   Marta Arias, et al.
0

This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statis- tics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evalu- ating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 bi- nary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2020

The Data Representativeness Criterion: Predicting the Performance of Supervised Classification Based on Data Set Similarity

In a broad range of fields it may be desirable to reuse a supervised cla...
research
03/02/2023

Encoding of data sets and algorithms

In many high-impact applications, it is important to ensure the quality ...
research
08/08/2017

Data-driven Advice for Applying Machine Learning to Bioinformatics Problems

As the bioinformatics field grows, it must keep pace not only with new d...
research
01/17/2022

Data-Centric Machine Learning in the Legal Domain

Machine learning research typically starts with a fixed data set created...
research
02/05/2019

Permutation Invariant Likelihoods and Equivariant Transformations

In this work, we fill a substantial void in machine learning and statist...
research
10/26/2019

Understanding Isomorphism Bias in Graph Data Sets

In recent years there has been a rapid increase in classification method...
research
03/31/2015

Improved Error Bounds Based on Worst Likely Assignments

Error bounds based on worst likely assignments use permutation tests to ...

Please sign up or login with your details

Forgot password? Click here to reset