ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

10/13/2022
by   Ankita Gupta, et al.
0

Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with various backgrounds. In this work, we develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial. We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets. Surprisingly, we find that reasonable quality annotations were already achievable (>90 agreement between the crowd and expert annotations) even without extensive training. On carefully analyzing the remaining disagreements, we identify the presence of linguistic cases that our annotators unanimously agree upon but lack unified treatments (e.g., generic pronouns, appositives) in existing datasets. We propose the research community should revisit these phenomena when curating future unified annotation guidelines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

Annotation Curricula to Implicitly Train Non-Expert Annotators

Annotation studies often require annotators to familiarize themselves wi...
research
12/31/2020

UCCA's Foundational Layer: Annotation Guidelines v2.1

This is the annotation manual for Universal Conceptual Cognitive Annotat...
research
03/16/2023

Investigating Failures to Generalize for Coreference Resolution Models

Coreference resolution models are often evaluated on multiple datasets. ...
research
11/03/2020

Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

This article presents a discussion on the main linguistic phenomena whic...
research
09/28/2019

Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data

We present our effort to create a large Multi-Layered representational r...
research
04/16/2021

A Comparative Study on Collecting High-Quality Implicit Reasonings at a Large-scale

Explicating implicit reasoning (i.e. warrants) in arguments is a long-st...
research
07/16/2023

Analyzing Dataset Annotation Quality Management in the Wild

Data quality is crucial for training accurate, unbiased, and trustworthy...

Please sign up or login with your details

Forgot password? Click here to reset