Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked Features, Delivered Fresh from the Oven

09/06/2021
by   Niclas Hildebrandt, et al.
0

This paper presents the contribution of the Data Science Kitchen at GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. The task aims at extending the identification of offensive language, by including additional subtasks that identify comments which should be prioritized for fact-checking by moderators and community managers. Our contribution focuses on a feature-engineering approach with a conventional classification backend. We combine semantic and writing style embeddings derived from pre-trained deep neural networks with additional numerical features, specifically designed for this task. Ensembles of Logistic Regression classifiers and Support Vector Machines are used to derive predictions for each subtask via a majority voting scheme. Our best submission achieved macro-averaged F1-scores of 66.8 toxic, engaging, and fact-claiming comments.

READ FULL TEXT
research
09/07/2021

FH-SWF SG at GermEval 2021: Using Transformer-Based Language Models to Identify Toxic, Engaging, Fact-Claiming Comments

In this paper we describe the methods we used for our submissions to the...
research
08/11/2023

Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models

The Forum for Information Retrieval (FIRE) started a shared task this ye...
research
10/03/2018

Machine Learning Suites for Online Toxicity Detection

To identify and classify toxic online commentary, the modern tools of da...
research
09/07/2021

FHAC at GermEval 2021: Identifying German toxic, engaging, and fact-claiming comments with ensemble learning

The availability of language representations learned by large pretrained...
research
07/30/2021

WLV-RIT at GermEval 2021: Multitask Learning with Transformers to Detect Toxic, Engaging, and Fact-Claiming Comments

This paper addresses the identification of toxic, engaging, and fact-cla...
research
06/17/2019

Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

This paper presents our contribution to PolEval 2019 Task 6: Hate speech...
research
06/14/2020

Application of Data Science to Discover Violence-Related Issues in Iraq

Data science has been satisfactorily used to discover social issues in s...

Please sign up or login with your details

Forgot password? Click here to reset