DeepAI AI Chat
Log In Sign Up

An Empirical Investigation of Learning from Biased Toxicity Labels

by   Neel Nanda, et al.

Collecting annotations from human raters often results in a trade-off between the quantity of labels one wishes to gather and the quality of these labels. As such, it is often only possible to gather a small amount of high-quality labels. In this paper, we study how different training strategies can leverage a small dataset of human-annotated labels and a large but noisy dataset of synthetically generated labels (which exhibit bias against identity groups) for predicting toxicity of online comments. We evaluate the accuracy and fairness properties of these approaches, and trade-offs between the two. While we find that initial training on all of the data and fine-tuning on clean data produces models with the highest AUC, we find that no single strategy performs best across all fairness metrics.


page 1

page 2

page 3

page 4


HQP: A Human-Annotated Dataset for Detecting Online Propaganda

Online propaganda poses a severe threat to the integrity of societies. H...

Fairly Accurate: Learning Optimal Accuracy vs. Fairness Tradeoffs for Hate Speech Detection

Recent work has emphasized the importance of balancing competing objecti...

Learning From Noisy Large-Scale Datasets With Minimal Supervision

We present an approach to effectively use millions of images with noisy ...

On the Impact of Data Quality on Image Classification Fairness

With the proliferation of algorithmic decision-making, increased scrutin...

Beyond Impossibility: Balancing Sufficiency, Separation and Accuracy

Among the various aspects of algorithmic fairness studied in recent year...

INN: A Method Identifying Clean-annotated Samples via Consistency Effect in Deep Neural Networks

In many classification problems, collecting massive clean-annotated data...

3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels

Training a 3D human keypoint detector from point clouds in a supervised ...