Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort

12/15/2021
by   Franziska Weeber, et al.
0

Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations, thus strongly reducing annotation cost and effort. For this purpose, we combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories in the given text documents. To highlight our research direction's potential, we evaluate the approach on the task of identifying frames in news articles. Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even these complex and subtle frames. On the framing dataset, the AL approach needs only 16.3% of the annotations to reach the same performance as a model trained on the full dataset.

READ FULL TEXT

page 1

page 2

research
07/10/2023

Active Learning for Video Classification with Frame Level Queries

Deep learning algorithms have pushed the boundaries of computer vision r...
research
04/21/2020

Observations on Annotations

The annotation of textual information is a fundamental activity in Lingu...
research
08/30/2020

A Survey of Deep Active Learning

Active learning (AL) attempts to maximize the performance gain of the mo...
research
08/16/2018

The DALPHI annotation framework & how its pre-annotations can improve annotator efficiency

Producing the required amounts of training data for machine learning and...
research
12/14/2022

THMA: Tencent HD Map AI System for Creating HD Map Annotations

Nowadays, autonomous vehicle technology is becoming more and more mature...
research
10/08/2020

DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool

We present a lightweight annotation tool, the Data AnnotatoR Tool (DART)...
research
01/03/2018

Semi-automated Annotation of Signal Events in Clinical EEG Data

To be effective, state of the art machine learning technology needs larg...

Please sign up or login with your details

Forgot password? Click here to reset