ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

07/03/2023
by   Javier de la Rosa, et al.
0

The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present Alberti, the first multilingual pre-trained large language model for poetry. Through domain-specific pre-training (DSP), we further trained multilingual BERT on a corpus of over 12 million verses from 12 languages. We evaluated its performance on two structural poetry tasks: Spanish stanza type classification, and metrical pattern prediction for Spanish, English and German. In both cases, Alberti outperforms multilingual BERT and other transformers-based models of similar sizes, and even achieves state-of-the-art results for German when compared to rule-based systems, demonstrating the feasibility and effectiveness of DSP in the poetry domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2020

RobBERT: a Dutch RoBERTa-based Language Model

Pre-trained language models have been dominating the field of natural la...
research
09/25/2021

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Pre-trained transformers are now the de facto models in Natural Language...
research
04/16/2021

A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks

In online domain-specific customer service applications, many companies ...
research
09/18/2023

The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

Sentiments inherently drive politics. How we receive and process informa...
research
03/14/2023

MEDBERT.de: A Comprehensive German BERT Model for the Medical Domain

This paper presents medBERTde, a pre-trained German BERT model specifica...
research
04/11/2022

Adapting BigScience Multilingual Model to Unseen Languages

We benchmark different strategies of adding new languages (German and Ko...
research
12/02/2019

Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

We propose a new method that leverages contextual embeddings for the tas...

Please sign up or login with your details

Forgot password? Click here to reset