DeepAI AI Chat
Log In Sign Up

NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection with Cross-lingual Transfer

08/04/2020
by   Hwijeen Ahn, et al.
Carnegie Mellon University
Seoul National University
SOGANG UNIVERSITY
0

This paper describes our approach to the task of identifying offensive languages in a multilingual setting. We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection. Leveraging the semi-supervised dataset resulted in performance improvements compared to the baseline trained solely with the manually-annotated dataset. We propose a new metric, Translation Embedding Distance, to measure the transferability of instances for cross-lingual data selection. We also introduce various preprocessing steps tailored for social media text along with methods to fine-tune the pre-trained multilingual BERT (mBERT) for offensive language identification. Our multilingual systems achieved competitive results in Greek, Danish, and Turkish at OffensEval 2020.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/15/2023

Measuring Cross-Lingual Transferability of Multilingual Transformers on Sentence Classification

Recent studies have exhibited remarkable capabilities of pre-trained mul...
11/27/2016

Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Prepositions are very common and very ambiguous, and understanding their...
04/23/2020

Characterising User Content on a Multi-lingual Social Network

Social media has been on the vanguard of political information diffusion...
09/14/2022

Parameter-Efficient Finetuning for Robust Continual Multilingual Learning

NLU systems deployed in the real world are expected to be regularly upda...
08/31/2021

Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

Transliteration is very common on social media, but transliterated text ...
03/31/2020

Multilingual Stance Detection: The Catalonia Independence Corpus

Stance detection aims to determine the attitude of a given text with res...