DeepAI AI Chat
Log In Sign Up

YouTube AV 50K: an Annotated Corpus for Comments in Autonomous Vehicles

by   Tao Li, et al.
Purdue University

With one billion monthly viewers, and millions of users discussing and sharing opinions, comments below YouTube videos are rich sources of data for opinion mining and sentiment analysis. We introduce the YouTube AV 50K dataset, a freely-available collections of more than 50,000 YouTube comments and metadata below autonomous vehicle (AV)-related videos. We describe its creation process, its content and data format, and discuss its possible usages. Especially, we do a case study of the first self-driving car fatality to evaluate the dataset, and show how we can use this dataset to better understand public attitudes toward self-driving cars and public reactions to the accident. Future developments of the dataset are also discussed.


page 3

page 4


Cross-Partisan Discussions on YouTube: Conservatives Talk to Liberals but Liberals Don't Talk to Conservatives

We present the first large-scale measurement study of cross-partisan dis...

Mining User Comment Activity for Detecting Forum Spammers in YouTube

Research shows that comment spamming (comments which are unsolicited, un...

Classifying YouTube Comments Based on Sentiment and Type of Sentence

As a YouTube channel grows, each video can potentially collect enormous ...

Personal-ITY: A Novel YouTube-based Corpus for Personality Prediction in Italian

We present a novel corpus for personality prediction in Italian, contain...

Matching Theory and Data with Personal-ITY: What a Corpus of Italian YouTube Comments Reveals About Personality

As a contribution to personality detection in languages other than Engli...

Are Chess Discussions Racist? An Adversarial Hate Speech Data Set

On June 28, 2020, while presenting a chess podcast on Grandmaster Hikaru...

The MeLa BitChute Dataset

In this paper we present a near-complete dataset of over 3M videos from ...