Generating Gender Augmented Data for NLP

07/13/2021
by   Nishtha Jain, et al.
12

Gender bias is a frequent occurrence in NLP-based applications, especially pronounced in gender-inflected languages. Bias can appear through associations of certain adjectives and animate nouns with the natural gender of referents, but also due to unbalanced grammatical gender frequencies of inflected words. This type of bias becomes more evident in generating conversational utterances where gender is not specified within the sentence, because most current NLP applications still work on a sentence-level context. As a step towards more inclusive NLP, this paper proposes an automatic and generalisable rewriting approach for short conversational sentences. The rewriting method can be applied to sentences that, without extra-sentential context, have multiple equivalent alternatives in terms of gender. The method can be applied both for creating gender balanced outputs as well as for creating gender balanced training data. The proposed approach is based on a neural machine translation (NMT) system trained to 'translate' from one gender alternative to another. Both the automatic and manual analysis of the approach show promising results for automatic generation of gender alternatives for conversational sentences in Spanish.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2020

Neural Machine Translation Doesn't Translate Gender Coreference Right Unless You Make It

Neural Machine Translation (NMT) has been shown to struggle with grammat...
research
04/09/2020

Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem

Training data for NLP tasks often exhibits gender bias in that fewer sen...
research
09/13/2021

NeuTral Rewriter: A Rule-Based and Neural Approach to Automatic Rewriting into Gender-Neutral Alternatives

Recent years have seen an increasing need for gender-neutral and inclusi...
research
05/28/2019

On Measuring Gender Bias in Translation of Gender-neutral Pronouns

Ethics regarding social bias has recently thrown striking issues in natu...
research
04/16/2021

Investigating Failures of Automatic Translation in the Case of Unambiguous Gender

Transformer based models are the modern work horses for neural machine t...
research
06/11/2019

Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology

Gender stereotypes are manifest in most of the world's languages and are...
research
12/10/2019

GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies

We introduce GeBioToolkit, a tool for extracting multilingual parallel c...

Please sign up or login with your details

Forgot password? Click here to reset