Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

05/15/2020
by   Jean-Baptiste Camps, et al.
0

This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.

READ FULL TEXT
research
06/29/2020

Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models

This paper describes our study on using mutilingual BERT embeddings and ...
research
04/29/2020

A Cross-Genre Ensemble Approach to Robust Reddit Part of Speech Tagging

Part of speech tagging is a fundamental NLP task often regarded as solve...
research
09/15/2021

Cross-Register Projection for Headline Part of Speech Tagging

Part of speech (POS) tagging is a familiar NLP task. State of the art ta...
research
09/10/2018

Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks

This paper describes our submission to CoNLL 2018 UD Shared Task. We hav...
research
10/22/2019

IPOD: Corpus of 190,000 Industrial Occupations

Job titles are the most fundamental building blocks for occupational dat...
research
03/20/2020

TArC: Incrementally and Semi-Automatically Collecting a Tunisian Arabish Corpus

This article describes the constitution process of the first morpho-synt...
research
06/19/2022

Towards building a Deep Learning based Automated Indian Classical Music Tutor for the Masses

Music can play an important role in the well-being of the world. Indian ...

Please sign up or login with your details

Forgot password? Click here to reset