Like trainer, like bot? Inheritance of bias in algorithmic content moderation

by   Reuben Binns, et al.

The internet has become a central medium through which `networked publics' express their opinions and engage in debate. Offensive comments and personal attacks can inhibit participation in these spaces. Automated content moderation aims to overcome this problem using machine learning classifiers trained on large corpora of texts manually annotated for offence. While such systems could help encourage more civil debate, they must navigate inherently normatively contestable boundaries, and are subject to the idiosyncratic norms of the human raters who provide the training data. An important objective for platforms implementing such measures might be to ensure that they are not unduly biased towards or against particular norms of offence. This paper provides some exploratory methods by which the normative biases of algorithmic content moderation systems can be measured, by way of a case study using an existing dataset of comments labelled for offence. We train classifiers on comments labelled by different demographic subsets (men and women) to understand how differences in conceptions of offence between these groups might affect the performance of the resulting models on various test sets. We conclude by discussing some of the ethical choices facing the implementers of algorithmic moderation systems, given various desired levels of diversity of viewpoints amongst discussion participants.


page 1

page 2

page 3

page 4


Towards Equal Gender Representation in the Annotations of Toxic Language Detection

Classifiers tend to propagate biases present in the data on which they a...

Rejoinder for the discussion of the paper "A novel algorithmic approach to Bayesian Logic Regression"

In this rejoinder we summarize the comments, questions and remarks on th...

Designing Toxic Content Classification for a Diversity of Perspectives

In this work, we demonstrate how existing classifiers for identifying to...

Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation

Machine learning models are commonly used to detect toxicity in online c...

The Potential of Using Vision Videos for CrowdRE: Video Comments as a Source of Feedback

Vision videos are established for soliciting feedback and stimulating di...

Reading Between the Demographic Lines: Resolving Sources of Bias in Toxicity Classifiers

The censorship of toxic comments is often left to the judgment of imperf...

Designing Word Filter Tools for Creator-led Comment Moderation

Online social platforms centered around content creators often allow com...