An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set

05/19/2023
by   Ali Ismail-Fawaz, et al.
0

The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Demšar (2006), have important shortcomings and, we show, are open to both inadvertent and intentional manipulation. To address these issues, we propose a new approach to presenting the results of benchmark comparisons, the Multiple Comparison Matrix (MCM), that prioritizes pairwise comparisons and precludes the means of manipulating experimental results in existing approaches. MCM can be used to show the results of an all-pairs comparison, or to show the results of a comparison between one or more selected algorithms and the state of the art. MCM is implemented in Python and is publicly available.

READ FULL TEXT

page 15

page 16

research
05/31/2023

Crowdsourcing subjective annotations using pairwise comparisons reduces bias and error compared to the majority-vote method

How to better reduce measurement variability and bias introduced by subj...
research
07/07/2021

"Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

Large eCommerce players introduced comparison tables as a new type of re...
research
05/28/2022

BulletArm: An Open-Source Robotic Manipulation Benchmark and Learning Framework

We present BulletArm, a novel benchmark and learning-environment for rob...
research
06/25/2022

BackdoorBench: A Comprehensive Benchmark of Backdoor Learning

Backdoor learning is an emerging and important topic of studying the vul...
research
12/11/2017

A practical guide and software for analysing pairwise comparison experiments

Most popular strategies to capture subjective judgments from humans invo...
research
03/30/2022

A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts

In recent years, Jupyter notebooks have grown in popularity in several d...
research
08/08/2022

Reliability of Solutions in Linear Ordering Problem: New Probabilistic Insight and Algorithms

In this work, our goal is to characterize the reliability of the solutio...

Please sign up or login with your details

Forgot password? Click here to reset