Data Excellence for AI: Why Should You Care

11/19/2021
by   Lora Aroyo, et al.
0

The efficacy of machine learning (ML) models depends on both algorithms and data. Training data defines what we want our models to learn, and testing data provides the means by which their empirical progress is measured. Benchmark datasets define the entire world within which models exist and operate, yet research continues to focus on critiquing and improving the algorithmic aspect of the models rather than critiquing and improving the data with which our models operate. If "data is the new oil," we are still missing work on the refineries by which the data itself could be optimized for more effective use.

READ FULL TEXT

page 1

page 2

page 3

research
12/21/2022

NADBenchmarks – a compilation of Benchmark Datasets for Machine Learning Tasks related to Natural Disasters

Climate change has increased the intensity, frequency, and duration of e...
research
12/15/2021

Fix your Models by Fixing your Datasets

The quality of underlying training data is very crucial for building per...
research
05/28/2021

Changing the World by Changing the Data

NLP community is currently investing a lot more research and resources i...
research
07/14/2020

Bringing the People Back In: Contesting Benchmark Machine Learning Datasets

In response to algorithmic unfairness embedded in sociotechnical systems...
research
07/21/2022

Detecting and Preventing Shortcut Learning for Fair Medical AI using Shortcut Testing (ShorT)

Machine learning (ML) holds great promise for improving healthcare, but ...
research
06/01/2022

On the Choice of Data for Efficient Training and Validation of End-to-End Driving Models

The emergence of data-driven machine learning (ML) has facilitated signi...
research
06/10/2023

Interpretable Differencing of Machine Learning Models

Understanding the differences between machine learning (ML) models is of...

Please sign up or login with your details

Forgot password? Click here to reset