The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics

06/20/2023
by   Matthias Orlikowski, et al.
0

Many NLP tasks exhibit human label variation, where different annotators give different labels to the same texts. This variation is known to depend, at least in part, on the sociodemographics of annotators. Recent research aims to model individual annotator behaviour rather than predicting aggregated labels, and we would expect that sociodemographic information is useful for these models. On the other hand, the ecological fallacy states that aggregate group behaviour, such as the behaviour of the average female annotator, does not necessarily explain individual behaviour. To account for sociodemographics in models of individual annotator behaviour, we introduce group-specific layers to multi-annotator models. In a series of experiments for toxic content detection, we find that explicitly accounting for sociodemographic attributes in this way does not significantly improve model performance. This result shows that individual annotation behaviour depends on much more than just sociodemographics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups using a Single Model across Cages

Behavioural experiments often happen in specialised arenas, but this may...
research
04/24/2023

Understanding and Predicting Human Label Variation in Natural Language Inference through Explanation

Human label variation (Plank 2022), or annotation disagreement, exists i...
research
09/03/2023

A dynamic state-based model of crowds

We consider the problem of categorizing and describing the dynamic prope...
research
09/13/2023

How (Not) to Use Sociodemographic Information for Subjective NLP Tasks

Annotators' sociodemographic backgrounds (i.e., the individual compositi...
research
05/24/2023

You Are What You Annotate: Towards Better Models through Annotator Representations

Annotator disagreement is ubiquitous in natural language processing (NLP...
research
11/04/2022

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Human variation in labeling is often considered noise. Annotation projec...
research
09/20/2022

Peer-group Behaviour Analytics of Windows Authentications Events Using Hierarchical Bayesian Modelling

Cyber-security analysts face an increasingly large number of alerts rece...

Please sign up or login with your details

Forgot password? Click here to reset