Mapping global dynamics of benchmark creation and saturation in artificial intelligence

03/09/2022
by   Adriano Barbosa-Silva, et al.
0

Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curated data for 1688 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trended towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks were prone to unforeseen bursts. We conclude that future work should focus on large-scale community collaboration and on mapping benchmark performance gains to real-world utility and impact of AI.

READ FULL TEXT

page 6

page 12

research
01/17/2021

Understanding in Artificial Intelligence

Current Artificial Intelligence (AI) methods, most based on deep learnin...
research
11/26/2021

AI and the Everything in the Whole Wide World Benchmark

There is a tendency across different subfields in AI to valorize a small...
research
08/06/2020

A critical analysis of metrics used for measuring progress in artificial intelligence

Comparing model performances on benchmark datasets is an integral part o...
research
08/31/2020

A Multisite, Report-Based, Centralized Infrastructure for Feedback and Monitoring of Radiology AI/ML Development and Clinical Deployment

An infrastructure for multisite, geographically-distributed creation and...
research
06/02/2023

DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection

The tremendous recent advances in generative artificial intelligence tec...
research
05/02/2020

DQI: Measuring Data Quality in NLP

Neural language models have achieved human level performance across seve...
research
12/20/2022

AI applications in forest monitoring need remote sensing benchmark datasets

With the rise in high resolution remote sensing technologies there has b...

Please sign up or login with your details

Forgot password? Click here to reset