"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation

05/17/2023
by   Anaelia Ovalle, et al.
1

Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life. Given the recent popularity and adoption of language generation technologies, the potential to further marginalize this population only grows. Although a multitude of NLP fairness literature focuses on illuminating and addressing gender biases, assessing gender harms for TGNB identities requires understanding how such identities uniquely interact with societal gender norms and how they differ from gender binary-centric perspectives. Such measurement frameworks inherently require centering TGNB voices to help guide the alignment between gender-inclusive NLP and whom they are intended to serve. Towards this goal, we ground our work in the TGNB community and existing interdisciplinary literature to assess how the social reality surrounding experienced marginalization of TGNB persons contributes to and persists within Open Language Generation (OLG). This social knowledge serves as a guide for evaluating popular large language models (LLMs) on two key aspects: (1) misgendering and (2) harmful responses to gender disclosure. To do this, we introduce TANGO, a dataset of template-based real-world text curated from a TGNB-oriented community. We discover a dominance of binary gender norms reflected by the models; LLMs least misgendered subjects in generated text when triggered by prompts whose subjects used binary pronouns. Meanwhile, misgendering was most prevalent when triggering generation with singular they and neopronouns. When prompted with gender disclosures, TGNB disclosure generated the most stigmatizing language and scored most toxic, on average. Our findings warrant further research on how TGNB harms manifest in LLMs and serve as a broader case study toward concretely grounding the design of gender-inclusive AI in community voices and interdisciplinary literature.

READ FULL TEXT

page 18

page 19

research
08/27/2021

Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

Gender is widely discussed in the context of language tasks and when exa...
research
01/27/2021

BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation

Recent advances in deep learning techniques have enabled machines to gen...
research
06/06/2023

MISGENDERED: Limits of Large Language Models in Understanding Pronouns

Content Warning: This paper contains examples of misgendering and erasur...
research
02/24/2022

Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender

The world of pronouns is changing. From a closed class of words with few...
research
09/15/2021

Text as Causal Mediators: Research Design for Causal Estimates of Differential Treatment of Social Groups via Language Aspects

Using observed language to understand interpersonal interactions is impo...
research
05/23/2023

TalkUp: A Novel Dataset Paving the Way for Understanding Empowering Language

Empowering language is important in many real-world contexts, from educa...
research
09/30/2020

Using sex and gender in survey adjustment

Accounting for sex and gender characteristics is a complex, structural c...

Please sign up or login with your details

Forgot password? Click here to reset