CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

06/28/2023
by   Yufei Huang, et al.
0

Holistically measuring societal biases of large language models is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values. The curation process contains 4 essential steps: bias identification via extensive literature review, ambiguous context generation, AI-assisted disambiguous context generation, snd manual review \& recomposition. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. The dataset exhibits wide coverage and high diversity. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories. Additionally, we observe from our experiments that fine-tuned models could, to a certain extent, heed instructions and avoid generating outputs that are morally harmful in some types, in the way of "moral self-correction". Our dataset and results are publicly available at \href{https://github.com/YFHuangxxxx/CBBQ}{https://github.com/YFHuangxxxx/CBBQ}, offering debiasing research opportunities to a widened community.

READ FULL TEXT
research
05/23/2022

Challenges in Measuring Bias via Open-Ended Language Generation

Researchers have devised numerous ways to quantify social biases vested ...
research
11/29/2020

Intrinsic Knowledge Evaluation on Chinese Language Models

Recent NLP tasks have benefited a lot from pre-trained language models (...
research
08/24/2023

CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias

As language models (LMs) become increasingly powerful, it is important t...
research
07/25/2020

Bollyrics: Automatic Lyrics Generator for Romanised Hindi

Song lyrics convey a meaningful story in a creative manner with complex ...
research
05/29/2023

Large Language Models are not Fair Evaluators

We uncover a systematic bias in the evaluation paradigm of adopting larg...
research
06/20/2023

TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models

Large Language Models (LLMs) such as ChatGPT, have gained significant at...

Please sign up or login with your details

Forgot password? Click here to reset