Towards A Reliable Ground-Truth For Biased Language Detection

12/14/2021
by   Timo Spinde, et al.
0

Reference texts such as encyclopedias and news articles can manifest biased language when objective reporting is substituted by subjective writing. Existing methods to detect bias mostly rely on annotated data to train machine learning models. However, low annotator agreement and comparability is a substantial drawback in available media bias corpora. To evaluate data collection options, we collect and compare labels obtained from two popular crowdsourcing platforms. Our results demonstrate the existing crowdsourcing approaches' lack of data quality, underlining the need for a trained expert framework to gather a more reliable dataset. By creating such a framework and gathering a first dataset, we are able to improve Krippendorff's α = 0.144 (crowdsourcing labels) to α = 0.419 (expert labels). We conclude that detailed annotator training increases data quality, improving the performance of existing bias detection systems. We will continue to extend our dataset in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2021

MBIC – A Media Bias Annotation Dataset Including Annotator Characteristics

Many people consider news articles to be a reliable source of informatio...
research
05/19/2016

False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking

With the rapid growth of crowdsourcing platforms it has become easy and ...
research
03/24/2020

A Pitfall of Learning from User-generated Data: In-depth Analysis of Subjective Class Problem

Research in the supervised learning algorithms field implicitly assumes ...
research
01/17/2022

PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection

In this paper we introduce PerPaDa, a Persian paraphrase dataset that is...
research
10/12/2020

The Extraordinary Failure of Complement Coercion Crowdsourcing

Crowdsourcing has eased and scaled up the collection of linguistic annot...
research
06/24/2023

Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach

Existing machine learning models have proven to fail when it comes to th...
research
10/28/2021

IMDB-WIKI-SbS: An Evaluation Dataset for Crowdsourced Pairwise Comparisons

Today, comprehensive evaluation of large-scale machine learning models i...

Please sign up or login with your details

Forgot password? Click here to reset