Standardizing linguistic data: method and tools for annotating (pre-orthographic) French

11/22/2020
by   Simon Gabay, et al.
8

With the development of big corpora of various periods, it becomes crucial to standardise linguistic annotation (e.g. lemmas, POS tags, morphological annotation) to increase the interoperability of the data produced, despite diachronic variations. In the present paper, we describe both methodologically (by proposing annotation principles) and technically (by creating the required training data and the relevant models) the production of a linguistic tagger for (early) modern French (16-18th c.), taking as much as possible into account already existing standards for contemporary and, especially, medieval French.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/28/2019

Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data

We present our effort to create a large Multi-Layered representational r...
06/11/2020

Provenance for Linguistic Corpora Through Nanopublications

Research in Computational Linguistics is dependent on text corpora for t...
02/18/2016

Overview of Annotation Creation: Processes & Tools

Creating linguistic annotations requires more than just a reliable annot...
08/12/2020

The Annotation Guideline of LST20 Corpus

This report presents the annotation guideline for LST20, a large-scale c...
11/24/2021

For the Purpose of Curry: A UD Treebank for Ashokan Prakrit

We present the first linguistically annotated treebank of Ashokan Prakri...
05/15/2021

Annotation Uncertainty in the Context of Grammatical Change

This paper elaborates on the notion of uncertainty in the context of ann...
02/11/2020

Training with Streaming Annotation

In this paper, we address a practical scenario where training data is re...