Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models

05/29/2023
by   Myra Cheng, et al.
0

To recognize and mitigate harms from large language models (LLMs), we need to understand the prevalence and nuances of stereotypes in LLM outputs. Toward this end, we present Marked Personas, a prompt-based method to measure stereotypes in LLMs for intersectional demographic groups without any lexicon or data labeling. Grounded in the sociolinguistic concept of markedness (which characterizes explicitly linguistically marked categories versus unmarked defaults), our proposed method is twofold: 1) prompting an LLM to generate personas, i.e., natural language descriptions, of the target demographic group alongside personas of unmarked, default groups; 2) identifying the words that significantly distinguish personas of the target group from corresponding unmarked ones. We find that the portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts. The words distinguishing personas of marked (non-white, non-male) groups reflect patterns of othering and exoticizing these demographics. An intersectional lens further reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women. These representational harms have concerning implications for downstream applications like story generation.

READ FULL TEXT

page 7

page 16

research
05/28/2023

KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application

Large language models (LLMs) learn not only natural text generation abil...
research
03/17/2022

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Toxic language detection systems often falsely flag text that contains m...
research
11/15/2022

Using Open-Ended Stressor Responses to Predict Depressive Symptoms across Demographics

Stressors are related to depression, but this relationship is complex. W...
research
06/23/2022

Theory-Grounded Measurement of U.S. Social Stereotypes in English Language Models

NLP models trained on text have been shown to reproduce human stereotype...
research
06/11/2019

Calibration, Entropy Rates, and Memory in Language Models

Building accurate language models that capture meaningful long-term depe...
research
05/11/2023

When the Majority is Wrong: Leveraging Annotator Disagreement for Subjective Tasks

Though majority vote among annotators is typically used for ground truth...
research
05/22/2022

Evidence for Hypodescent in Visual Semantic AI

We examine the state-of-the-art multimodal "visual semantic" model CLIP ...

Please sign up or login with your details

Forgot password? Click here to reset