Bipol: Multi-axes Evaluation of Bias with Explainability in Benchmark Datasets

01/28/2023
by   Tosin Adewumi, et al.
0

We evaluate five English NLP benchmark datasets (available on the superGLUE leaderboard) for bias, along multiple axes. The datasets are the following: Boolean Question (Boolq), CommitmentBank (CB), Winograd Schema Challenge (WSC), Winogender diagnostic (AXg), and Recognising Textual Entailment (RTE). Bias can be harmful and it is known to be common in data, which ML models learn from. In order to mitigate bias in data, it is crucial to be able to estimate it objectively. We use bipol, a novel multi-axes bias metric with explainability, to quantify and explain how much bias exists in these datasets. Multilingual, multi-axes bias evaluation is not very common. Hence, we also contribute a new, large labelled Swedish bias-detection dataset, with about 2 million samples; translated from the English version. In addition, we contribute new multi-axes lexica for bias detection in Swedish. We train a SotA model on the new dataset for bias detection. We make the codes, model, and new dataset publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2023

Bipol: A Novel Multi-Axes Bias Evaluation Metric with Explainability for NLP

We introduce bipol, a new metric with explainability, for estimating soc...
research
04/01/2021

An Investigation of Critical Issues in Bias Mitigation Techniques

A critical problem in deep learning is that systems learn inappropriate ...
research
07/31/2023

KoBBQ: Korean Bias Benchmark for Question Answering

The BBQ (Bias Benchmark for Question Answering) dataset enables the eval...
research
01/21/2022

Gender Bias in Text: Labeled Datasets and Lexicons

Language has a profound impact on our thoughts, perceptions, and concept...
research
09/27/2022

mRobust04: A Multilingual Version of the TREC Robust 2004 Benchmark

Robust 2004 is an information retrieval benchmark whose large number of ...
research
04/25/2023

Introducing MBIB – the first Media Bias Identification Benchmark Task and Dataset Collection

Although media bias detection is a complex multi-task problem, there is,...
research
12/21/2022

NADBenchmarks – a compilation of Benchmark Datasets for Machine Learning Tasks related to Natural Disasters

Climate change has increased the intensity, frequency, and duration of e...

Please sign up or login with your details

Forgot password? Click here to reset