Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning

09/21/2022
by   Hannah Rose Kirk, et al.
3

Annotating abusive language is expensive, logistically complex and creates a risk of psychological harm. However, most machine learning research has prioritized maximizing effectiveness (i.e., F1 or accuracy score) rather than data efficiency (i.e., minimizing the amount of data that is annotated). In this paper, we use simulated experiments over two datasets at varying percentages of abuse to demonstrate that transformers-based active learning is a promising approach to substantially raise efficiency whilst still maintaining high effectiveness, especially when abusive content is a smaller percentage of the dataset. This approach requires a fraction of labeled data to reach performance equivalent to training over the full dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Uncertainty-based Query Strategies for Active Learning with Transformers

Active learning is the iterative construction of a classification model ...
research
01/23/2023

Speeding Up BatchBALD: A k-BALD Family of Approximations for Active Learning

Active learning is a powerful method for training machine learning model...
research
03/26/2020

Active Learning Approach to Optimization of Experimental Control

In this work we present a general machine learning based scheme to optim...
research
11/16/2020

Sampling Approach Matters: Active Learning for Robotic Language Acquisition

Ordering the selection of training data using active learning can lead t...
research
08/07/2020

Deep Active Learning with Crowdsourcing Data for Privacy Policy Classification

Privacy policies are statements that notify users of the services' data ...
research
06/16/2023

LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning

Labeled data are critical to modern machine learning applications, but o...
research
11/08/2019

Char-RNN and Active Learning for Hashtag Segmentation

We explore the abilities of character recurrent neural network (char-RNN...

Please sign up or login with your details

Forgot password? Click here to reset