The Moral Foundations Reddit Corpus

08/10/2022
by   Jackson Trager, et al.
0

Moral framing and sentiment can affect a variety of online and offline behaviors, including donation, pro-environmental action, political engagement, and even participation in violent protests. Various computational methods in Natural Language Processing (NLP) have been used to detect moral sentiment from textual data, but in order to achieve better performances in such subjective tasks, large sets of hand-annotated training data are needed. Previous corpora annotated for moral sentiment have proven valuable, and have generated new insights both within NLP and across the social sciences, but have been limited to Twitter. To facilitate improving our understanding of the role of moral rhetoric, we present the Moral Foundations Reddit Corpus, a collection of 16,123 Reddit comments that have been curated from 12 distinct subreddits, hand-annotated by at least three trained annotators for 8 categories of moral sentiment (i.e., Care, Proportionality, Equality, Purity, Authority, Loyalty, Thin Morality, Implicit/Explicit Morality) based on the updated Moral Foundations Theory (MFT) framework. We use a range of methodologies to provide baseline moral-sentiment classification results for this new corpus, e.g., cross-domain classification and knowledge transfer.

READ FULL TEXT
research
07/09/2017

PELESent: Cross-domain polarity classification using distant supervision

The enormous amount of texts published daily by Internet users has foste...
research
01/20/2022

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

Sentiment analysis is one of the most widely studied applications in NLP...
research
11/13/2019

LexiPers: An ontology based sentiment lexicon for Persian

Sentiment analysis refers to the use of natural language processing to i...
research
04/28/2022

Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Even though hate speech (HS) online has been an important object of rese...
research
04/28/2023

SemEval-2023 Task 11: Learning With Disagreements (LeWiDi)

NLP datasets annotated with human judgments are rife with disagreements ...
research
12/24/2017

Building a Sentiment Corpus of Tweets in Brazilian Portuguese

The large amount of data available in social media, forums and websites ...
research
07/03/2022

Multi-aspect Multilingual and Cross-lingual Parliamentary Speech Analysis

Parliamentary and legislative debate transcripts provide an exciting ins...

Please sign up or login with your details

Forgot password? Click here to reset