Standardizing linguistic data: method and tools for annotating (pre-orthographic) French

11/22/2020
by   Simon Gabay, et al.
8

With the development of big corpora of various periods, it becomes crucial to standardise linguistic annotation (e.g. lemmas, POS tags, morphological annotation) to increase the interoperability of the data produced, despite diachronic variations. In the present paper, we describe both methodologically (by proposing annotation principles) and technically (by creating the required training data and the relevant models) the production of a linguistic tagger for (early) modern French (16-18th c.), taking as much as possible into account already existing standards for contemporary and, especially, medieval French.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2019

Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data

We present our effort to create a large Multi-Layered representational r...
research
06/11/2020

Provenance for Linguistic Corpora Through Nanopublications

Research in Computational Linguistics is dependent on text corpora for t...
research
02/18/2016

Overview of Annotation Creation: Processes & Tools

Creating linguistic annotations requires more than just a reliable annot...
research
08/12/2020

The Annotation Guideline of LST20 Corpus

This report presents the annotation guideline for LST20, a large-scale c...
research
11/24/2021

For the Purpose of Curry: A UD Treebank for Ashokan Prakrit

We present the first linguistically annotated treebank of Ashokan Prakri...
research
05/15/2021

Annotation Uncertainty in the Context of Grammatical Change

This paper elaborates on the notion of uncertainty in the context of ann...
research
07/30/2023

Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation

Tone is a crucial component of the prosody of Shanghainese, a Wu Chinese...

Please sign up or login with your details

Forgot password? Click here to reset