The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English

06/05/2019
by   Nikola Ljubešić, et al.
0

In this paper we present datasets of Facebook comment threads to mainstream media posts in Slovene and English developed inside the Slovene national project FRENK which cover two topics, migrants and LGBT, and are manually annotated for different types of socially unacceptable discourse (SUD). The main advantages of these datasets compared to the existing ones are identical sampling procedures, producing comparable data across languages and an annotation schema that takes into account six types of SUD and five targets at which SUD is directed. We describe the sampling and annotation procedures, and analyze the annotation distributions and inter-annotator agreements. We consider this dataset to be an important milestone in understanding and combating SUD for both languages.

READ FULL TEXT
research
03/09/2020

Shallow Discourse Annotation for Chinese TED Talks

Text corpora annotated with language-related properties are an important...
research
06/02/2022

The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia

Expression of sentiment in parliamentary debates is deemed to be signifi...
research
01/11/2017

Cross-lingual RST Discourse Parsing

Discourse parsing is an integral part of understanding information flow ...
research
11/19/2021

The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse

In this paper, we discuss the development of a multilingual dataset anno...
research
08/23/2019

Toward Dialogue Modeling: A Semantic Annotation Scheme for Questions and Answers

The present study proposes an annotation scheme for classifying the cont...
research
08/08/2023

Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?"

We study Socially Unacceptable Discourse (SUD) characterization and dete...
research
08/14/2020

Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis

This paper presents a novel scheme for the annotation of hate speech in ...

Please sign up or login with your details

Forgot password? Click here to reset