Challenges in Annotating Datasets to Quantify Bias in Under-represented Society

09/11/2023
by   Vithya Yogarajan, et al.
0

Recent advances in artificial intelligence, including the development of highly sophisticated large language models (LLM), have proven beneficial in many real-world applications. However, evidence of inherent bias encoded in these LLMs has raised concerns about equity. In response, there has been an increase in research dealing with bias, including studies focusing on quantifying bias and developing debiasing techniques. Benchmark bias datasets have also been developed for binary gender classification and ethical/racial considerations, focusing predominantly on American demographics. However, there is minimal research in understanding and quantifying bias related to under-represented societies. Motivated by the lack of annotated datasets for quantifying bias in under-represented societies, we endeavoured to create benchmark datasets for the New Zealand (NZ) population. We faced many challenges in this process, despite the availability of three annotators. This research outlines the manual annotation process, provides an overview of the challenges we encountered and lessons learnt, and presents recommendations for future research.

READ FULL TEXT
research
06/13/2023

Survey on Sociodemographic Bias in Natural Language Processing

Deep neural networks often learn unintended biases during training, whic...
research
09/16/2023

Bias and Fairness in Chatbots: An Overview

Chatbots have been studied for more than half a century. With the rapid ...
research
06/18/2023

Gender Bias in Transformer Models: A comprehensive survey

Gender bias in artificial intelligence (AI) has emerged as a pressing co...
research
01/24/2023

Investigating Labeler Bias in Face Annotation for Machine Learning

In a world increasingly reliant on artificial intelligence, it is more i...
research
03/17/2023

Practical and Ethical Challenges of Large Language Models in Education: A Systematic Literature Review

Educational technology innovations that have been developed based on lar...
research
04/06/2023

Uncurated Image-Text Datasets: Shedding Light on Demographic Bias

The increasing tendency to collect large and uncurated datasets to train...
research
04/28/2018

Detect, Quantify, and Incorporate Dataset Bias: A Neuroimaging Analysis on 12,207 Individuals

Neuroimaging datasets keep growing in size to address increasingly compl...

Please sign up or login with your details

Forgot password? Click here to reset