Log In Sign Up

Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

by   Thomas Haider, et al.

A prerequisite for the computational study of literature is the availability of properly digitized texts, ideally with reliable meta-data and ground-truth annotation. Poetry corpora do exist for a number of languages, but larger collections lack consistency and are encoded in various standards, while annotated corpora are typically constrained to a particular genre and/or were designed for the analysis of certain linguistic features (like rhyme). In this work, we provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models that enable robust large scale analysis. We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches. In a multi-task setup, particular beneficial task relations illustrate the inter-dependence of poetic features. A model learns foot boundaries better when jointly predicting syllable stress, aesthetic emotions and verse measures benefit from each other, and we find that caesuras are quite dependent on syntax and also integral to shaping the overall measure of the line.


page 1

page 2

page 3

page 4


Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling

This work presents a new resource for borrowing identification and analy...

Multi-task Learning Based Neural Bridging Reference Resolution

We propose a multi task learning-based neural model for bridging referen...

A New Recurrent Neural CRF for Learning Non-linear Edge Features

Conditional Random Field (CRF) and recurrent neural models have achieved...

Creating and Managing a large annotated parallel corpora of Indian languages

This paper presents the challenges in creating and managing large parall...

MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora

Multi-word expressions (MWEs) are a hot topic in research in natural lan...

PyPlutchik: visualising and comparing emotion-annotated corpora

The increasing availability of textual corpora and data fetched from soc...

Using Automatically Extracted Minimum Spans to Disentangle Coreference Evaluation from Boundary Detection

The common practice in coreference resolution is to identify and evaluat...