Robust Quantification of Gender Disparity in Pre-Modern English Literature using Natural Language Processing

04/12/2022
by   Akarsh Nagaraj, et al.
0

Research has continued to shed light on the extent and significance of gender disparity in social, cultural and economic spheres. More recently, computational tools from the Natural Language Processing (NLP) literature have been proposed for measuring such disparity using relatively extensive datasets and empirically rigorous methodologies. In this paper, we contribute to this line of research by studying gender disparity, at scale, in copyright-expired literary texts published in the pre-modern period (defined in this work as the period ranging from the mid-nineteenth through the mid-twentieth century). One of the challenges in using such tools is to ensure quality control, and by extension, trustworthy statistical analysis. Another challenge is in using materials and methods that are publicly available and have been established for some time, both to ensure that they can be used and vetted in the future, and also, to add confidence to the methodology itself. We present our solution to addressing these challenges, and using multiple measures, demonstrate the significant discrepancy between the prevalence of female characters and male characters in pre-modern literature. The evidence suggests that the discrepancy declines when the author is female. The discrepancy seems to be relatively stable as we plot data over the decades in this century-long period. Finally, we aim to carefully describe both the limitations and ethical caveats associated with this study, and others like it.

READ FULL TEXT
research
06/30/2017

Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

We highlight an important frontier in algorithmic fairness: disparity in...
research
05/03/2020

Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations

Disparities in authorship and citations across gender can have substanti...
research
06/05/2019

Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

Sinhala is the native language of the Sinhalese people who make up the l...
research
06/13/2019

Advance gender prediction tool of first names and its use in analysing gender disparity in Computer Science in the UK, Malaysia and China

Global gender disparity in science is an unsolved problem. Predicting ge...
research
12/05/2022

INCLUSIFY: A benchmark and a model for gender-inclusive German

Gender-inclusive language is important for achieving gender equality in ...
research
03/19/2018

Dynamic Natural Language Processing with Recurrence Quantification Analysis

Writing and reading are dynamic processes. As an author composes a text,...
research
06/28/2018

Quantitative analysis on the disparity of regional economic development in China and its evolution from 1952 to 2000

Domestic and foreign scholars have already done much research on regiona...

Please sign up or login with your details

Forgot password? Click here to reset