HLE-UPC at SemEval-2021 Task 5: Multi-Depth DistilBERT for Toxic Spans Detection

04/01/2021
by   Rafel Palliser-Sans, et al.
0

This paper presents our submission to SemEval-2021 Task 5: Toxic Spans Detection. The purpose of this task is to detect the spans that make a text toxic, which is a complex labour for several reasons. Firstly, because of the intrinsic subjectivity of toxicity, and secondly, due to toxicity not always coming from single words like insults or offends, but sometimes from whole expressions formed by words that may not be toxic individually. Following this idea of focusing on both single words and multi-word expressions, we study the impact of using a multi-depth DistilBERT model, which uses embeddings from different layers to estimate the final per-token toxicity. Our quantitative results show that using information from multiple depths boosts the performance of the model. Finally, we also analyze our best model qualitatively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2021

SemEval-2021 Task 1: Lexical Complexity Prediction

This paper presents the results and main findings of SemEval-2021 Task 1...
research
04/13/2021

UPB at SemEval-2021 Task 7: Adversarial Multi-Task Learning for Detecting and Rating Humor and Offense

Detecting humor is a challenging task since words might share multiple v...
research
04/14/2019

Distributed representation of multi-sense words: A loss-driven approach

Word2Vec's Skip Gram model is the current state-of-the-art approach for ...
research
11/06/2018

Knuth's Moves on Timed Words

We give an exposition of Schensted's algorithm to find the length of the...
research
11/05/2021

On the Impact of Temporal Representations on Metaphor Detection

State-of-the-art approaches for metaphor detection compare their literal...
research
07/10/2020

GloVeInit at SemEval-2020 Task 1: Using GloVe Vector Initialization for Unsupervised Lexical Semantic Change Detection

This paper presents a vector initialization approach for the SemEval2020...
research
05/19/2023

Persian Typographical Error Type Detection using Many-to-Many Deep Neural Networks on Algorithmically-Generated Misspellings

Digital technologies have led to an influx of text created daily in a va...

Please sign up or login with your details

Forgot password? Click here to reset