REAL: A Representative Error-Driven Approach for Active Learning

07/03/2023
by   Cheng Chen, et al.
0

Given a limited labeling budget, active learning (AL) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, AL typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose REAL, a novel approach to select data instances with Representative Errors for Active Learning. It identifies minority predictions as pseudo errors within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that REAL consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that REAL selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2021

A Simple Baseline for Low-Budget Active Learning

Active learning focuses on choosing a subset of unlabeled data to be lab...
research
07/22/2020

DEAL: Deep Evidential Active Learning for Image Classification

Convolutional Neural Networks (CNNs) have proven to be state-of-the-art ...
research
03/23/2023

Box-Level Active Detection

Active learning selects informative samples for annotation within budget...
research
11/25/2021

Active Learning at the ImageNet Scale

Active learning (AL) algorithms aim to identify an optimal subset of dat...
research
12/09/2020

Cost-Based Budget Active Learning for Deep Learning

Majorly classical Active Learning (AL) approach usually uses statistical...
research
02/11/2022

Predicting Out-of-Distribution Error with the Projection Norm

We propose a metric – Projection Norm – to predict a model's performance...
research
08/13/2020

Contextual Diversity for Active Learning

Requirement of large annotated datasets restrict the use of deep convolu...

Please sign up or login with your details

Forgot password? Click here to reset