From stage to page: language independent bootstrap measures of distinctiveness in fictional speech

01/13/2023
by   Artjoms Šela, et al.
0

Stylometry is mostly applied to authorial style. Recently, researchers have begun investigating the style of characters, finding that the variation remains within authorial bounds. We address the stylistic distinctiveness of characters in drama. Our primary contribution is methodological; we introduce and evaluate two non-parametric methods to produce a summary statistic for character distinctiveness that can be usefully applied and compared across languages and times. Our first method is based on bootstrap distances between 3-gram probability distributions, the second (reminiscent of 'unmasking' techniques) on word keyness curves. Both methods are validated and explored by applying them to a reasonably large corpus (a subset of DraCor): we analyse 3301 characters drawn from 2324 works, covering five centuries and four languages (French, German, Russian, and the works of Shakespeare). Both methods appear useful; the 3-gram method is statistically more powerful but the word keyness method offers rich interpretability. Both methods are able to capture phonological differences such as accent or dialect, as well as broad differences in topic and lexical richness. Based on exploratory analysis, we find that smaller characters tend to be more distinctive, and that women are cross-linguistically more distinctive than men, with this latter finding carefully interrogated using multiple regression. This greater distinctiveness stems from a historical tendency for female characters to be restricted to an 'internal narrative domain' covering mainly direct discourse and family/romantic themes. It is hoped that direct, comparable statistical measures will form a basis for more sophisticated future studies, and advances in theory.

READ FULL TEXT

page 5

page 6

page 17

page 18

research
10/24/2020

Revisiting Neural Language Modelling with Syllables

Language modelling is regularly analysed at word, subword or character u...
research
09/29/2016

A comparative study of complexity of handwritten Bharati characters with that of major Indian scripts

We present Bharati, a simple, novel script that can represent the charac...
research
09/02/2017

Patterns versus Characters in Subword-aware Neural Language Modeling

Words in some natural languages can have a composite structure. Elements...
research
01/24/2019

Squared English Word: A Method of Generating Glyph to Use Super Characters for Sentiment Analysis

The Super Characters method addresses sentiment analysis problems by fir...
research
02/16/2023

Tragic and Comical Networks. Clustering Dramatic Genres According to Structural Properties

There is a growing tradition in the joint field of network studies and d...
research
02/17/2018

Global-scale phylogenetic linguistic inference from lexical resources

Automatic phylogenetic inference plays an increasingly important role in...
research
09/16/2022

Quantifying Discourse Support for Omitted Pronouns

Pro-drop is commonly seen in many languages, but its discourse motivatio...

Please sign up or login with your details

Forgot password? Click here to reset