Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition

02/24/2020
by   Xiaolei Huang, et al.
0

Existing research on fairness evaluation of document classification models mainly uses synthetic monolingual data without ground truth for author demographic attributes. In this work, we assemble and publish a multilingual Twitter corpus for the task of hate speech detection with inferred four author demographic factors: age, country, gender and race/ethnicity. The corpus covers five languages: English, Italian, Polish, Portuguese and Spanish. We evaluate the inferred demographic labels with a crowdsourcing platform, Figure Eight. To examine factors that can cause biases, we take an empirical analysis of demographic predictability on the English corpus. We measure the performance of four popular document classifiers and evaluate the fairness and bias of the baseline classifiers on the author-level demographic attributes.

READ FULL TEXT
research
05/26/2023

Nichelle and Nancy: The Influence of Demographic Attributes and Tokenization Length on First Name Biases

Through the use of first name substitution experiments, prior research h...
research
09/16/2021

Balancing out Bias: Achieving Fairness Through Training Reweighting

Bias in natural language processing arises primarily from models learnin...
research
04/20/2022

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus

A conversation corpus is essential to build interactive AI applications....
research
04/12/2022

Easy Adaptation to Mitigate Gender Bias in Multilingual Text Classification

Existing approaches to mitigate demographic biases evaluate on monolingu...
research
05/15/2019

Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

Social media provide access to behavioural data at an unprecedented scal...
research
09/07/2022

Decoding Demographic un-fairness from Indian Names

Demographic classification is essential in fairness assessment in recomm...
research
08/11/2022

A Comprehensive Analysis of AI Biases in DeepFake Detection With Massively Annotated Databases

In recent years, image and video manipulations with DeepFake have become...

Please sign up or login with your details

Forgot password? Click here to reset