Measuring Intersectional Biases in Historical Documents

05/21/2023
by   Nadav Borenstein, et al.
0

Data-driven analyses of biases in historical texts can help illuminate the origin and development of biases prevailing in modern society. However, digitised historical documents pose a challenge for NLP practitioners as these corpora suffer from errors introduced by optical character recognition (OCR) and are written in an archaic language. In this paper, we investigate the continuities and transformations of bias in historical newspapers published in the Caribbean during the colonial era (18th to 19th centuries). Our analyses are performed along the axes of gender, race, and their intersection. We examine these biases by conducting a temporal study in which we measure the development of lexical associations using distributional semantics models and word embeddings. Further, we evaluate the effectiveness of techniques designed to process OCR-generated data and assess their stability when trained on and applied to the noisy historical newspapers. We find that there is a trade-off between the stability of the word embeddings and their compatibility with the historical dataset. We provide evidence that gender and racial biases are interdependent, and their intersection triggers distinct effects. These findings align with the theory of intersectionality, which stresses that biases affecting people with multiple marginalised identities compound to more than the sum of their constituents.

READ FULL TEXT
research
05/18/2020

Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings

Recent research has demonstrated that vector space models of semantics c...
research
08/13/2021

Diachronic Analysis of German Parliamentary Proceedings: Ideological Shifts through the Lens of Political Biases

We analyze bias in historical corpora as encoded in diachronic distribut...
research
04/26/2019

Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors

Word embeddings have recently been shown to reflect many of the pronounc...
research
03/24/2022

Gender and Racial Stereotype Detection in Legal Opinion Word Embeddings

Studies have shown that some Natural Language Processing (NLP) systems e...
research
04/25/2020

When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People?

Social biases are encoded in word embeddings. This presents a unique opp...
research
10/27/2020

Discovering and Interpreting Conceptual Biases in Online Communities

Language carries implicit human biases, functioning both as a reflection...
research
08/06/2020

Discovering and Categorising Language Biases in Reddit

We present a data-driven approach using word embeddings to discover and ...

Please sign up or login with your details

Forgot password? Click here to reset