Explorations in an English Poetry Corpus: A Neurocognitive Poetics Perspective

01/06/2018
by   Arthur M. Jacobs, et al.
0

This paper describes a corpus of about 3000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare). Quantitative Narrative Analysis (QNA) is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC) which comprises over 100 poetic texts with around 2 million words from about 50 authors (e.g., Keats, Joyce, Wordsworth). Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot's poem 'How Lisa Loved the King' and James Joyce's 'Chamber Music', concerning e.g. lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Natural Language Processing or Neurocognitive Poetics, e.g. as training and test corpus, or for stimulus development and control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2018

LSICC: A Large Scale Informal Chinese Corpus

Deep learning based natural language processing model is proven powerful...
research
11/12/2021

Dataset of Philippine Presidents Speeches from 1935 to 2016

The dataset was collected to examine and identify possible key topics wi...
research
12/03/2014

Mary Astell's words in A Serious Proposal to the Ladies (part I), a lexicographic inquiry with NooJ

In the following article we elected to study with NooJ the lexis of a 17...
research
12/20/2016

Inferring the location of authors from words in their texts

For the purposes of computational dialectology or other geographically b...
research
12/31/2018

Pull out all the stops: Textual analysis via punctuation sequences

Whether enjoying the lucid prose of a favorite author or slogging throug...
research
10/21/2020

Quasi Error-free Text Classification and Authorship Recognition in a large Corpus of English Literature based on a Novel Feature Set

The Gutenberg Literary English Corpus (GLEC) provides a rich source of t...
research
10/08/2021

Development of an Extractive Title Generation System Using Titles of Papers of Top Conferences for Intermediate English Students

The formulation of good academic paper titles in English is challenging ...

Please sign up or login with your details

Forgot password? Click here to reset