Six Attributes of Unhealthy Conversation

10/14/2020
by   Ilan Price, et al.
0

We present a new dataset of approximately 44000 comments labeled by crowdworkers. Each comment is labelled as either 'healthy' or 'unhealthy', in addition to binary labels for the presence of six potentially 'unhealthy' sub-attributes: (1) hostile; (2) antagonistic, insulting, provocative or trolling; (3) dismissive; (4) condescending or patronising; (5) sarcastic; and/or (6) an unfair generalisation. Each label also has an associated confidence score. We argue that there is a need for datasets which enable research based on a broad notion of 'unhealthy online conversation'. We build this typology to encompass a substantial proportion of the individual comments which contribute to unhealthy online conversation. For some of these attributes, this is the first publicly available dataset of this scale. We explore the quality of the dataset, present some summary statistics and initial models to illustrate the utility of this data, and highlight limitations and directions for further research.

READ FULL TEXT
research
01/26/2022

Explainable Patterns for Distinction and Prediction of Moral Judgement on Reddit

The forum r/AmITheAsshole in Reddit hosts discussion on moral issues bas...
research
10/25/2018

Analyzing Assumptions in Conversation Disentanglement Research Through the Lens of a New Dataset and Model

Disentangling conversations mixed together in a single stream of message...
research
06/17/2020

Using Sentiment Information for Preemptive Detection of Toxic Comments in Online Conversations

The challenge of automatic detection of toxic comments online has been t...
research
02/04/2021

Bangla Text Dataset and Exploratory Analysis for Online Harassment Detection

Being the seventh most spoken language in the world, the use of the Bang...
research
08/09/2022

Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2

Image aesthetic quality assessment is popular during the last decade. Be...
research
06/07/2018

Is preprocessing of text really worth your time for online comment classification?

A large proportion of online comments present on public domains are cons...
research
05/26/2023

Dramatic Conversation Disentanglement

We present a new dataset for studying conversation disentanglement in mo...

Please sign up or login with your details

Forgot password? Click here to reset