Chapter Captor: Text Segmentation in Novels

11/09/2020
by   Charuta Pethe, et al.
2

Books are typically segmented into chapters and sections, representing coherent subnarratives and topics. We investigate the task of predicting chapter boundaries, as a proxy for the general task of segmenting long texts. We build a Project Gutenberg chapter segmentation data set of 9,126 English novels, using a hybrid approach combining neural inference and rule matching to recognize chapter title headers in books, achieving an F1-score of 0.77 on this task. Using this annotated data as ground truth after removing structural cues, we present cut-based and neural methods for chapter segmentation, achieving an F1-score of 0.453 on the challenging task of exact break prediction over book-length documents. Finally, we reveal interesting historical trends in the chapter structure of novels.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/30/2022

Word Segmentation and Morphological Parsing for Sanskrit

We describe our participation in the Word Segmentation and Morphological...
05/01/2021

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

While many NLP papers, tasks and pipelines assume raw, clean texts, many...
08/24/2020

Prediction of ICD Codes with Clinical BERT Embeddings and Text Augmentation with Label Balancing using MIMIC-III

This paper achieves state of the art results for the ICD code prediction...
03/09/2021

gambit – An Open Source Name Disambiguation Tool for Version Control Systems

Name disambiguation is a complex but highly relevant challenge whenever ...
09/11/2020

Semantic Relations and Deep Learning

The second edition of "Semantic Relations Between Nominals" (by Vivi Nas...
01/10/2015

Autonomous Farm Vehicles: Prototype of Power Reaper

Chapter 2 will begin with introduction of Agricultural Robotics. There w...
05/05/2022

Artificial Intelligence and Structural Injustice: Foundations for Equity, Values, and Responsibility

This chapter argues for a structural injustice approach to the governanc...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.