Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

02/17/2021
by   Thomas Haider, et al.
0

A prerequisite for the computational study of literature is the availability of properly digitized texts, ideally with reliable meta-data and ground-truth annotation. Poetry corpora do exist for a number of languages, but larger collections lack consistency and are encoded in various standards, while annotated corpora are typically constrained to a particular genre and/or were designed for the analysis of certain linguistic features (like rhyme). In this work, we provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models that enable robust large scale analysis. We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches. In a multi-task setup, particular beneficial task relations illustrate the inter-dependence of poetic features. A model learns foot boundaries better when jointly predicting syllable stress, aesthetic emotions and verse measures benefit from each other, and we find that caesuras are quite dependent on syntax and also integral to shaping the overall measure of the line.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2022

Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling

This work presents a new resource for borrowing identification and analy...
research
03/07/2020

Multi-task Learning Based Neural Bridging Reference Resolution

We propose a multi task learning-based neural model for bridging referen...
research
11/14/2016

A New Recurrent Neural CRF for Learning Non-linear Edge Features

Conditional Random Field (CRF) and recurrent neural models have achieved...
research
01/12/2023

Adversarial Adaptation for French Named Entity Recognition

Named Entity Recognition (NER) is the task of identifying and classifyin...
research
12/03/2021

Creating and Managing a large annotated parallel corpora of Indian languages

This paper presents the challenges in creating and managing large parall...
research
01/15/2020

Transfer learning for biomedical named entity recognition with neural networks.

Motivation The explosive increase of biomedical literature has made i...
research
04/19/2021

PyPlutchik: visualising and comparing emotion-annotated corpora

The increasing availability of textual corpora and data fetched from soc...

Please sign up or login with your details

Forgot password? Click here to reset