Towards a Shared Rubric for Dataset Annotation

12/07/2021
by   Andrew Marc Greene, et al.
0

When arranging for third-party data annotation, it can be hard to compare how well the competing providers apply best practices to create high-quality datasets. This leads to a "race to the bottom," where competition based solely on price makes it hard for vendors to charge for high-quality annotation. We propose a voluntary rubric which can be used (a) as a scorecard to compare vendors' offerings, (b) to communicate our expectations of the vendors more clearly and consistently than today, (c) to justify the expense of choosing someone other than the lowest bidder, and (d) to encourage annotation providers to improve their practices.

READ FULL TEXT

page 2

page 3

page 4

research
09/24/2020

Best Practices for Managing Data Annotation Projects

Annotation is the labeling of data by human effort. Annotation is critic...
research
03/21/2022

Whose AI Dream? In search of the aspiration in data annotation

This paper present the practice of data annotation from the perspective ...
research
06/05/2022

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

Annotated data is an essential ingredient in natural language processing...
research
07/16/2023

Analyzing Dataset Annotation Quality Management in the Wild

Data quality is crucial for training accurate, unbiased, and trustworthy...
research
02/18/2019

FreeLabel: A Publicly Available Annotation Tool based on Freehand Traces

Large-scale annotation of image segmentation datasets is often prohibiti...
research
12/17/2022

Towards Robust Handwritten Text Recognition with On-the-fly User Participation

Long-term OCR services aim to provide high-quality output to their users...
research
07/17/2023

Can We Trust Race Prediction?

In the absence of sensitive race and ethnicity data, researchers, regula...

Please sign up or login with your details

Forgot password? Click here to reset