A Crowdsourced Frame Disambiguation Corpus with Ambiguity

04/12/2019
by   Anca Dumitrache, et al.
0

We present a resource for the task of FrameNet semantic frame disambiguation of over 5,000 word-sentence pairs from the Wikipedia corpus. The annotations were collected using a novel crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. In contrast to the typical approach of attributing the best single frame to each word, we provide a list of frames with disagreement-based scores that express the confidence with which each frame applies to the word. This is based on the idea that inter-annotator disagreement is at least partly caused by ambiguity that is inherent to the text and frames. We have found many examples where the semantics of individual frames overlap sufficiently to make them acceptable alternatives for interpreting a sentence. We have argued that ignoring this ambiguity creates an overly arbitrary target for training and evaluating natural language processing systems - if humans cannot agree, why would we expect the correct answer from a machine to be any different? To process this data we also utilized an expanded lemma-set provided by the Framester system, which merges FN with WordNet to enhance coverage. Our dataset includes annotations of 1,000 sentence-word pairs whose lemmas are not part of FN. Finally we present metrics for evaluating frame disambiguation systems that account for ambiguity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2018

Capturing Ambiguity in Crowdsourcing Frame Disambiguation

FrameNet is a computational linguistics resource composed of semantic fr...
research
02/01/2023

AmbiCoref: Evaluating Human and Model Sensitivity to Ambiguous Coreference

Given a sentence "Abby told Brittney that she upset Courtney", one would...
research
05/12/2022

Noun2Verb: Probabilistic frame semantics for word class conversion

Humans can flexibly extend word usages across different grammatical clas...
research
08/18/2018

CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement

Typically crowdsourcing-based approaches to gather annotated data use in...
research
08/09/2018

Arithmetic Word Problem Solver using Frame Identification

Automatic Word problem solving has always posed a great challenge for th...
research
01/17/2022

On the Context-Free Ambiguity of Emoji: A Data-Driven Study of 1,289 Emojis

Emojis come with prepacked semantics making them great candidates to cre...
research
08/04/2021

Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty

Human ratings have become a crucial resource for training and evaluating...

Please sign up or login with your details

Forgot password? Click here to reset