Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning

Human-annotated data plays a critical role in the fairness of AI systems, including those that deal with life-altering decisions or moderating human-created web/social media content. Conventionally, annotator disagreements are resolved before any learning takes place. However, researchers are increasingly identifying annotator disagreement as pervasive and meaningful. They also question the performance of a system when annotators disagree. Particularly when minority views are disregarded, especially among groups that may already be underrepresented in the annotator population. In this paper, we introduce CrowdOpinion[Accepted for publication at ACL 2023], an unsupervised learning based approach that uses language features and label distributions to pool similar items into larger samples of label distributions. We experiment with four generative and one density-based clustering method, applied to five linear combinations of label distributions and features. We use five publicly available benchmark datasets (with varying levels of annotator disagreements) from social media (Twitter, Gab, and Reddit). We also experiment in the wild using a dataset from Facebook, where annotations come from the platform itself by users reacting to posts. We evaluate CrowdOpinion as a label distribution prediction task using KL-divergence and a single-label problem using accuracy measures.

READ FULL TEXT
research
08/28/2020

Posting Bot Detection on Blockchain-based Social Media Platform using Machine Learning Techniques

Steemit is a blockchain-based social media platform, where authors can g...
research
07/19/2016

Discriminating between similar languages in Twitter using label propagation

Identifying the language of social media messages is an important first ...
research
08/24/2019

Experiments in Social Media

Social media platforms like Facebook and Twitter permit experiments to b...
research
11/28/2017

Social Media, Money, and Politics: Campaign Finance in the 2016 US Congressional Cycle

With social media penetration deepening among both citizens and politica...
research
10/25/2022

CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization

Social media has increasingly played a key role in emergency response: f...
research
10/05/2017

Machine Learning Based Detection of Clickbait Posts in Social Media

Clickbait (headlines) make use of misleading titles that hide critical i...
research
03/16/2020

Neighborhood-based Pooling for Population-level Label Distribution Learning

Supervised machine learning often requires human-annotated data. While a...

Please sign up or login with your details

Forgot password? Click here to reset