The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts

We present the Project Dialogism Novel Corpus, or PDNC, an annotated dataset of quotations for English literary texts. PDNC contains annotations for 35,978 quotations across 22 full-length novels, and is by an order of magnitude the largest corpus of its kind. Each quotation is annotated for the speaker, addressees, type of quotation, referring expression, and character mentions within the quotation text. The annotated attributes allow for a comprehensive evaluation of models of quotation attribution and coreference for literary texts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2023

Improving Automatic Quotation Attribution in Literary Novels

Current models for quotation attribution in literary novels assume varyi...
research
11/16/2020

Datasets and Models for Authorship Attribution on Italian Personal Writings

Existing research on Authorship Attribution (AA) focuses on texts for wh...
research
05/11/2023

Towards a Computational Analysis of Suspense: Detecting Dangerous Situations

Suspense is an important tool in storytelling to keep readers engaged an...
research
07/01/2020

So What's the Plan? Mining Strategic Planning Document

In this paper we present a corpus of Russian strategic planning document...
research
07/01/2020

So What's the Plan? Mining Strategic Planning Documents

In this paper we present a corpus of Russian strategic planning document...
research
09/27/2019

Multi-Modal Citizen Science: From Disambiguation to Transcription of Classical Literature

The engagement of citizens in the research projects, including Digital H...
research
10/21/2019

The Czech Court Decisions Corpus (CzCDC): Availability as the First Step

In this paper, we describe the Czech Court Decision Corpus (CzCDC). CzCD...

Please sign up or login with your details

Forgot password? Click here to reset