DataPerf: Benchmarks for Data-Centric AI Development

07/20/2022
by   Mark Mazumder, et al.
17

Machine learning (ML) research has generally focused on models, while the most prominent datasets have been employed for everyday ML tasks without regard for the breadth, difficulty, and faithfulness of these datasets to the underlying problem. Neglecting the fundamental importance of datasets has caused major problems involving data cascades in real-world applications and saturation of dataset-driven criteria for model quality, hindering research growth. To solve this problem, we present DataPerf, a benchmark package for evaluating ML datasets and dataset-working algorithms. We intend it to enable the "data ratchet," in which training sets will aid in evaluating test sets on the same problems, and vice versa. Such a feedback-driven strategy will generate a virtuous loop that will accelerate development of data-centric AI. The MLCommons Association will maintain DataPerf.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2021

What can Data-Centric AI Learn from Data and ML Engineering?

Data-centric AI is a new and exciting research topic in the AI community...
research
10/25/2021

Bridging the gap to real-world for network intrusion detection systems with data-centric approach

Most research using machine learning (ML) for network intrusion detectio...
research
11/09/2022

DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems

While there have been a number of remarkable breakthroughs in machine le...
research
09/05/2020

Examining Machine Learning for 5G and Beyond through an Adversarial Lens

Spurred by the recent advances in deep learning to harness rich informat...
research
12/03/2022

Applications of AI in Astronomy

We provide a brief, and inevitably incomplete overview of the use of Mac...
research
11/20/2021

Combining Data-driven Supervision with Human-in-the-loop Feedback for Entity Resolution

The distribution gap between training datasets and data encountered in p...
research
08/26/2020

Bandit Data-driven Optimization: AI for Social Good and Beyond

The use of machine learning (ML) systems in real-world applications enta...

Please sign up or login with your details

Forgot password? Click here to reset