Improving Automatic Quotation Attribution in Literary Novels

Current models for quotation attribution in literary novels assume varying levels of available information in their training and test data, which poses a challenge for in-the-wild inference. Here, we approach quotation attribution as a set of four interconnected sub-tasks: character identification, coreference resolution, quotation identification, and speaker attribution. We benchmark state-of-the-art models on each of these sub-tasks independently, using a large dataset of annotated coreferences and quotations in literary novels (the Project Dialogism Novel Corpus). We also train and evaluate models for the speaker attribution task in particular, showing that a simple sequential prediction model achieves accuracy scores on par with state-of-the-art models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2022

The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts

We present the Project Dialogism Novel Corpus, or PDNC, an annotated dat...
research
09/18/2023

Speaker attribution in German parliamentary debates with QLoRA-adapted large language models

The growing body of political texts opens up new opportunities for rich ...
research
06/15/2023

Evaluating Data Attribution for Text-to-Image Models

While large text-to-image models are able to synthesize "novel" images, ...
research
02/06/2021

Speaker attribution with voice profiles by graph-based semi-supervised learning

Speaker attribution is required in many real-world applications, such as...
research
05/03/2023

Defending against Insertion-based Textual Backdoor Attacks via Attribution

Textual backdoor attack, as a novel attack model, has been shown to be e...
research
11/15/2018

Characterizing Design Patterns of EHR-Driven Phenotype Extraction Algorithms

The automatic development of phenotype algorithms from Electronic Health...
research
04/25/2017

Automatic Compositor Attribution in the First Folio of Shakespeare

Compositor attribution, the clustering of pages in a historical printed ...

Please sign up or login with your details

Forgot password? Click here to reset