PANDORA Talks: Personality and Demographics on Reddit

04/09/2020
by   Matej Gjurković, et al.
0

Personality and demographics are important variables in social sciences, while in NLP they can aid in intepretability and removal of societal biases. However, datasets with both personality and demographic labels are scarce. To address this, we present PANDORA, the first large-scale dataset of Reddit comments labeled with three personality models (including the well-established Big 5 model) and demographics (age, gender, and location) for more than 10k users. We showcase the usefulness of this dataset on three experiments, where we leverage the more readily available data from other personality models to predict the Big 5 traits, analyze gender classification biases arising from psycho-demographic variables, and carry out a confirmatory and exploratory analysis based on psychological theories. Finally, we present benchmark prediction models for all personality and demographic variables.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

Debiasing NLP Models Without Demographic Information

Models trained from real-world data tend to imitate and amplify social b...
research
05/26/2023

Nichelle and Nancy: The Influence of Demographic Attributes and Tokenization Length on First Name Biases

Through the use of first name substitution experiments, prior research h...
research
05/03/2019

Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets

The ImageNet dataset ushered in a flood of academic and industry interes...
research
03/16/2021

Balancing Biases and Preserving Privacy on Balanced Faces in the Wild

There are demographic biases in the SOTA CNN used for FR. Our BFW datase...
research
09/07/2022

Decoding Demographic un-fairness from Indian Names

Demographic classification is essential in fairness assessment in recomm...
research
08/28/2023

Eleven Years of Gender Data Visualization: A Step Towards More Inclusive Gender Representation

We present an analysis of the representation of gender as a data dimensi...
research
02/09/2018

A Study of WhatsApp Usage Patterns and Prediction Models without Message Content

Internet social networks have become a ubiquitous application allowing p...

Please sign up or login with your details

Forgot password? Click here to reset