DeepAI AI Chat
Log In Sign Up

The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

10/18/2022
by   Nikil Roashan Selvam, et al.
0

How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. To do so, we empirically simulate various alternative constructions for a given benchmark based on innocuous modifications (such as paraphrasing or random-sampling) that maintain the essence of their social bias. On two well-known social bias benchmarks (Winogender and BiasNLI) we observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models. We hope these troubling observations motivate more robust measures of social biases.

READ FULL TEXT

page 3

page 4

03/10/2022

Speciesist Language and Nonhuman Animal Bias in English Masked Language Models

Various existing studies have analyzed what social biases are inherited ...
02/15/2022

Multiparameter Bernoulli Factories

We consider the problem of computing with many coins of unknown bias. We...
12/20/2022

Trustworthy Social Bias Measurement

How do we design measures of social bias that we trust? While prior work...
09/28/2022

Racial Bias in the Beautyverse

This short paper proposes a preliminary and yet insightful investigation...
10/14/2022

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Several benchmarks have been built with heavy investment in resources to...
10/02/2019

Quantifying Voter Biases in Online Platforms: An Instrumental Variable Approach

In content-based online platforms, use of aggregate user feedback (say, ...
10/09/2022

Quantifying Social Biases Using Templates is Unreliable

Recently, there has been an increase in efforts to understand how large ...