Building Socio-culturally Inclusive Stereotype Resources with Community Engagement

07/20/2023
by   Sunipa Dev, et al.
0

With rapid development and deployment of generative language models in global settings, there is an urgent need to also scale our measurements of harm, not just in the number and types of harms covered, but also how well they account for local cultural contexts, including marginalized identities and the social biases experienced by them. Current evaluation paradigms are limited in their abilities to address this, as they are not representative of diverse, locally situated but global, socio-cultural perspectives. It is imperative that our evaluation resources are enhanced and calibrated by including people and experiences from different cultures and societies worldwide, in order to prevent gross underestimations or skews in measurements of harm. In this work, we demonstrate a socio-culturally aware expansion of evaluation resources in the Indian societal context, specifically for the harm of stereotyping. We devise a community engaged effort to build a resource which contains stereotypes for axes of disparity that are uniquely present in India. The resultant resource increases the number of stereotypes known for and in the Indian context by over 1000 stereotypes across many unique identities. We also demonstrate the utility and effectiveness of such expanded resources for evaluations of language models. CONTENT WARNING: This paper contains examples of stereotypes that may be offensive.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2023

BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models

The rapid development of Large Language Models (LLMs) and the emergence ...
research
09/25/2022

Re-contextualizing Fairness in NLP: The Case of India

Recent research has revealed undesirable biases in NLP data and models. ...
research
05/19/2023

SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

Stereotype benchmark datasets are crucial to detect and mitigate social ...
research
07/05/2022

A Model and Tool for Community Engagement Case Study: Community Engagement in the Bisotun World Heritage Site

Including local participation in cultural heritage management has always...
research
11/21/2022

Cultural Re-contextualization of Fairness Research in Language Technologies in India

Recent research has revealed undesirable biases in NLP data and models. ...
research
05/25/2023

Multi-lingual and Multi-cultural Figurative Language Understanding

Figurative language permeates human communication, but at the same time ...

Please sign up or login with your details

Forgot password? Click here to reset