Annotation Uncertainty in the Context of Grammatical Change

05/15/2021
by   Marie-Luis Merten, et al.
0

This paper elaborates on the notion of uncertainty in the context of annotation in large text corpora, specifically focusing on (but not limited to) historical languages. Such uncertainty might be due to inherent properties of the language, for example, linguistic ambiguity and overlapping categories of linguistic description, but could also be caused by lacking annotation expertise. By examining annotation uncertainty in more detail, we identify the sources and deepen our understanding of the nature and different types of uncertainty encountered in daily annotation practice. Moreover, some practical implications of our theoretical findings are also discussed. Last but not least, this article can be seen as an attempt to reconcile the perspectives of the main scientific disciplines involved in corpus projects, linguistics and computer science, to develop a unified view and to highlight the potential synergies between these disciplines.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/08/2022

MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi

We present a completed, publicly available corpus of annotated semantic ...
06/27/2017

Using text analysis to quantify the similarity and evolution of scientific disciplines

We use an information-theoretic measure of linguistic similarity to inve...
04/02/2020

NUBES: A Corpus of Negation and Uncertainty in Spanish Clinical Texts

This paper introduces the first version of the NUBes corpus (Negation an...
03/17/2022

Towards Responsible Natural Language Annotation for the Varieties of Arabic

When building NLP models, there is a tendency to aim for broader coverag...
11/22/2020

Standardizing linguistic data: method and tools for annotating (pre-orthographic) French

With the development of big corpora of various periods, it becomes cruci...
11/24/2021

For the Purpose of Curry: A UD Treebank for Ashokan Prakrit

We present the first linguistically annotated treebank of Ashokan Prakri...
10/23/2017

A Scalable and Adaptive Method for Finding Semantically Equivalent Cue Words of Uncertainty

Scientific knowledge is constantly subject to a variety of changes due t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.