Manuscripts in Time and Space: Experiments in Scriptometrics on an Old French Corpus

01/30/2018
by   Jean-Baptiste Camps, et al.
0

Witnesses of medieval literary texts, preserved in manuscript, are layered objects , being almost exclusively copies of copies. This results in multiple and hard to distinguish linguistic strata -- the author's scripta interacting with the scriptae of the various scribes -- in a context where literary written language is already a dialectal hybrid. Moreover, no single linguistic phenomenon allows to distinguish between different scriptae, and only the combination of multiple characteristics is likely to be significant [9] -- but which ones? The most common approach is to search for these features in a set of previously selected texts, that are supposed to be representative of a given scripta. This can induce a circularity, in which texts are used to select features that in turn characterise them as belonging to a linguistic area. To counter this issue, this paper offers an unsupervised and corpus-based approach, in which clustering methods are applied to an Old French corpus to identify main divisions and groups. Ultimately, scriptometric profiles are built for each of them.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2023

Who could be behind QAnon? Authorship attribution with supervised machine-learning

A series of social media posts signed under the pseudonym "Q", started a...
research
06/28/2018

Predicting CEFRL levels in learner English on the basis of metrics and full texts

This paper analyses the contribution of language metrics and, potentiall...
research
09/23/2021

Corpus and Models for Lemmatisation and POS-tagging of Old French

Old French is a typical example of an under-resourced historic languages...
research
04/18/2016

Efficient Calculation of Bigram Frequencies in a Corpus of Short Texts

We show that an efficient and popular method for calculating bigram freq...
research
08/16/2018

Linguistic data mining with complex networks: a stylometric-oriented approach

By representing a text by a set of words and their co-occurrences, one o...
research
05/18/2023

Computational thematics: Comparing algorithms for clustering the genres of literary fiction

What are the best methods of capturing thematic similarity between liter...
research
11/18/2022

Corpus non alignés et ADT. Essai de comparaison entre les présidents français et brésiliens de l'ère contemporaine

Is there an ADT method that can deal with non-aligned bilingual corpora?...

Please sign up or login with your details

Forgot password? Click here to reset