IITK@Detox at SemEval-2021 Task 5: Semi-Supervised Learning and Dice Loss for Toxic Spans Detection

04/04/2021
by   Archit Bansal, et al.
7

In this work, we present our approach and findings for SemEval-2021 Task 5 - Toxic Spans Detection. The task's main aim was to identify spans to which a given text's toxicity could be attributed. The task is challenging mainly due to two constraints: the small training dataset and imbalanced class distribution. Our paper investigates two techniques, semi-supervised learning and learning with Self-Adjusting Dice Loss, for tackling these challenges. Our submitted system (ranked ninth on the leader board) consisted of an ensemble of various pre-trained Transformer Language Models trained using either of the above-proposed techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2019

Iterative Self-Learning: Semi-Supervised Improvement to Dataset Volumes and Model Accuracy

A novel semi-supervised learning technique is introduced based on a simp...
research
07/17/2019

HODGEPODGE: Sound event detection based on ensemble of semi-supervised learning methods

In this paper, we present a method called HODGEPODGE[1] for large-scale ...
research
05/16/2020

ApplicaAI at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS and Whether Self-Training Helps Them

This paper presents the winning system for the propaganda Technique Clas...
research
08/29/2014

Comment on "Ensemble Projection for Semi-supervised Image Classification"

In a series of papers by Dai and colleagues [1,2], a feature map (or ker...
research
01/06/2021

Exploring Semi-Supervised Learning for Predicting Listener Backchannels

Developing human-like conversational agents is a prime area in HCI resea...
research
10/28/2018

Semi-Supervised Translation with MMD Networks

This work aims to improve semi-supervised learning in a neural network a...

Please sign up or login with your details

Forgot password? Click here to reset