NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection with Cross-lingual Transfer

08/04/2020
by   Hwijeen Ahn, et al.
0

This paper describes our approach to the task of identifying offensive languages in a multilingual setting. We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection. Leveraging the semi-supervised dataset resulted in performance improvements compared to the baseline trained solely with the manually-annotated dataset. We propose a new metric, Translation Embedding Distance, to measure the transferability of instances for cross-lingual data selection. We also introduce various preprocessing steps tailored for social media text along with methods to fine-tune the pre-trained multilingual BERT (mBERT) for offensive language identification. Our multilingual systems achieved competitive results in Greek, Danish, and Turkish at OffensEval 2020.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification

This paper presents our system entitled `LIIR' for SemEval-2020 Task 12 ...
research
05/15/2023

Measuring Cross-Lingual Transferability of Multilingual Transformers on Sentence Classification

Recent studies have exhibited remarkable capabilities of pre-trained mul...
research
11/27/2016

Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Prepositions are very common and very ambiguous, and understanding their...
research
04/23/2020

Characterising User Content on a Multi-lingual Social Network

Social media has been on the vanguard of political information diffusion...
research
09/14/2022

Parameter-Efficient Finetuning for Robust Continual Multilingual Learning

NLU systems deployed in the real world are expected to be regularly upda...
research
03/05/2023

WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks

Intimacy is an essential element of human relationships and language is ...
research
08/31/2021

Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

Transliteration is very common on social media, but transliterated text ...

Please sign up or login with your details

Forgot password? Click here to reset