Personal-ITY: A Novel YouTube-based Corpus for Personality Prediction in Italian

11/11/2020
by   Elisa Bassignana, et al.
0

We present a novel corpus for personality prediction in Italian, containing a larger number of authors and a different genre compared to previously available resources. The corpus is built exploiting Distant Supervision, assigning Myers-Briggs Type Indicator (MBTI) labels to YouTube comments, and can lend itself to a variety of experiments. We report on preliminary experiments on Personal-ITY, which can serve as a baseline for future work, showing that some types are easier to predict than others, and discussing the perks of cross-dataset prediction.

READ FULL TEXT
research
11/11/2020

Matching Theory and Data with Personal-ITY: What a Corpus of Italian YouTube Comments Reveals About Personality

As a contribution to personality detection in languages other than Engli...
research
07/30/2018

YouTube AV 50K: an Annotated Corpus for Comments in Autonomous Vehicles

With one billion monthly viewers, and millions of users discussing and s...
research
06/27/2023

YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus

Machine learning for sign languages is bottlenecked by data. In this pap...
research
04/11/2021

NorDial: A Preliminary Corpus of Written Norwegian Dialect Use

Norway has a large amount of dialectal variation, as well as a general t...
research
10/31/2021

Classifying YouTube Comments Based on Sentiment and Type of Sentence

As a YouTube channel grows, each video can potentially collect enormous ...
research
11/20/2020

Are Chess Discussions Racist? An Adversarial Hate Speech Data Set

On June 28, 2020, while presenting a chess podcast on Grandmaster Hikaru...
research
10/12/2020

Vulgaris: Analysis of a Corpus for Middle-Age Varieties of Italian Language

Italian is a Romance language that has its roots in Vulgar Latin. The bi...

Please sign up or login with your details

Forgot password? Click here to reset