Data vs classifiers, who wins?

07/15/2021
by   Lucas F. F. Cardoso, et al.
0

The classification experiments covered by machine learning (ML) are composed by two important parts: the data and the algorithm. As they are a fundamental part of the problem, both must be considered when evaluating a model's performance against a benchmark. The best classifiers need robust benchmarks to be properly evaluated. For this, gold standard benchmarks such as OpenML-CC18 are used. However, data complexity is commonly not considered along with the model during a performance evaluation. Recent studies employ Item Response Theory (IRT) as a new approach to evaluating datasets and algorithms, capable of evaluating both simultaneously. This work presents a new evaluation methodology based on IRT and Glicko-2, jointly with the decodIRT tool developed to guide the estimation of IRT in ML. It explores the IRT as a tool to evaluate the OpenML-CC18 benchmark for its algorithmic evaluation capability and checks if there is a subset of datasets more efficient than the original benchmark. Several classifiers, from classics to ensemble, are also evaluated using the IRT models. The Glicko-2 rating system was applied together with IRT to summarize the innate ability and classifiers performance. It was noted that not all OpenML-CC18 datasets are really useful for evaluating algorithms, where only 10 existence of a more efficient subset containing only 50 While Randon Forest was singled out as the algorithm with the best innate ability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2020

Decoding machine learning benchmarks

Despite the availability of benchmark machine learning (ML) repositories...
research
07/14/2021

Generative and reproducible benchmarks for comprehensive evaluation of machine learning classifiers

Understanding the strengths and weaknesses of machine learning (ML) algo...
research
11/08/2022

Classification of Colorectal Cancer Polyps via Transfer Learning and Vision-Based Tactile Sensing

In this study, to address the current high earlydetection miss rate of c...
research
06/15/2023

AQuA: A Benchmarking Tool for Label Quality Assessment

Machine learning (ML) models are only as good as the data they are train...
research
10/14/2022

Evaluating Out-of-Distribution Performance on Document Image Classifiers

The ability of a document classifier to handle inputs that are drawn fro...
research
05/22/2023

Evaluating Model Performance in Medical Datasets Over Time

Machine learning (ML) models deployed in healthcare systems must face da...
research
09/13/2023

Towards Reliable Dermatology Evaluation Benchmarks

Benchmark datasets for digital dermatology unwittingly contain inaccurac...

Please sign up or login with your details

Forgot password? Click here to reset