Cartography Active Learning

09/09/2021
by   Mike Zhang, et al.
35

We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.

READ FULL TEXT
research
01/26/2019

The Use of Unlabeled Data versus Labeled Data for Stopping Active Learning for Text Classification

Annotation of training data is the major bottleneck in the creation of t...
research
02/01/2022

Federated Active Learning (F-AL): an Efficient Annotation Strategy for Federated Learning

Federated learning (FL) has been intensively investigated in terms of co...
research
12/07/2020

Active Learning Methods for Efficient Hybrid Biophysical Variable Retrieval

Kernel-based machine learning regression algorithms (MLRAs) are potentia...
research
01/30/2020

Fase-AL – Adaptation of Fast Adaptive Stacking of Ensembles for Supporting Active Learning

Classification algorithms to mine data stream have been extensively stud...
research
07/18/2023

Mining of Single-Class by Active Learning for Semantic Segmentation

Several Active Learning (AL) policies require retraining a target model ...
research
09/28/2022

Active Transfer Prototypical Network: An Efficient Labeling Algorithm for Time-Series Data

The paucity of labeled data is a typical challenge in the automotive ind...
research
04/12/2021

Active learning for medical code assignment

Machine Learning (ML) is widely used to automatically extract meaningful...

Please sign up or login with your details

Forgot password? Click here to reset