The Benchmark Lottery

07/14/2021
by   Mostafa Dehghani, et al.
8

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of "a benchmark lottery" that describes the overall fragility of the ML benchmarking process. The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a method being perceived as superior. On multiple benchmark setups that are prevalent in the ML community, we show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks, highlighting the fragility of the current paradigms and potential fallacious interpretation derived from benchmarking ML methods. Given that every benchmark makes a statement about what it perceives to be important, we argue that this might lead to biased progress in the community. We discuss the implications of the observed phenomena and provide recommendations on mitigating them using multiple machine learning domains and communities as use cases, including natural language processing, computer vision, information retrieval, recommender systems, and reinforcement learning.

READ FULL TEXT

page 8

page 29

research
12/13/2021

On the Value of ML Models

We argue that, when establishing and benchmarking Machine Learning (ML) ...
research
09/01/2022

Making Intelligence: Ethics, IQ, and ML Benchmarks

The ML community recognizes the importance of anticipating and mitigatin...
research
08/23/2023

On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers

Machine learning (ML) is nowadays widely used for different purposes and...
research
06/23/2022

A Review of Published Machine Learning Natural Language Processing Applications for Protocolling Radiology Imaging

Machine learning (ML) is a subfield of Artificial intelligence (AI), and...
research
10/11/2022

Vote'n'Rank: Revision of Benchmarking with Social Choice Theory

The development of state-of-the-art systems in different applied areas o...
research
10/12/2021

Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking for Everyone

Obtaining standardized crowdsourced benchmark of computational methods i...
research
05/11/2022

Evaluation Gaps in Machine Learning Practice

Forming a reliable judgement of a machine learning (ML) model's appropri...

Please sign up or login with your details

Forgot password? Click here to reset