Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty

08/04/2021
by   Quanze Chen, et al.
0

Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between uncertainty due to disagreement between annotators and ambiguity inherent to the item being rated. In this work, we present Goldilocks, a novel crowd rating elicitation technique for collecting calibrated scalar annotations that also distinguishes inherent ambiguity from inter-annotator disagreement. We introduce two main ideas: grounding absolute rating scales with examples and using a two-step bounding process to establish a range for an item's placement. We test our designs in three domains: judging toxicity of online comments, estimating satiety of food depicted in images, and estimating age based on portraits. We show that (1) Goldilocks can improve consistency in domains where interpretation of the scale is not universal, and that (2) representing items with ranges lets us simultaneously capture different sources of uncertainty leading to better estimates of pairwise relationship distributions.

READ FULL TEXT
research
09/05/2012

Conquering the rating bound problem in neighborhood-based collaborative filtering: a function recovery approach

As an important tool for information filtering in the era of socialized ...
research
12/05/2017

Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation

Rating scales are a widely used method for data annotation; however, the...
research
05/02/2023

Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus Disagreement

When groups of people are tasked with making a judgment, the issue of un...
research
02/08/2021

A psychometric modeling approach to fuzzy rating data

Modeling fuzziness and imprecision in human rating data is a crucial pro...
research
11/24/2015

Pairwise Comparisons Rating Scale Paradox

This study demonstrates that incorrect data are entered into a pairwise ...
research
04/12/2019

A Crowdsourced Frame Disambiguation Corpus with Ambiguity

We present a resource for the task of FrameNet semantic frame disambigua...

Please sign up or login with your details

Forgot password? Click here to reset