On the Limitations of Simulating Active Learning

05/21/2023
by   Katerina Margatina, et al.
2

Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects informative unlabeled data for human annotation, aiming to improve over random sampling. However, performing AL experiments with human annotations on-the-fly is a laborious and expensive process, thus unrealistic for academic research. An easy fix to this impediment is to simulate AL, by treating an already labeled and publicly available dataset as the pool of unlabeled data. In this position paper, we first survey recent literature and highlight the challenges across all different steps within the AL loop. We further unveil neglected caveats in the experimental setup that can significantly affect the quality of AL research. We continue with an exploration of how the simulation setting can govern empirical findings, arguing that it might be one of the answers behind the ever posed question “why do active learning algorithms sometimes fail to outperform random sampling?”. We argue that evaluating AL algorithms on available labeled datasets might provide a lower bound as to their effectiveness in real data. We believe it is essential to collectively shape the best practices for AL research, particularly as engineering advancements in LLMs push the research focus towards data-driven approaches (e.g., data efficiency, alignment, fairness). In light of this, we have developed guidelines for future work. Our aim is to draw attention to these limitations within the community, in the hope of finding ways to address them.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2023

A survey on online active learning

Online active learning is a paradigm in machine learning that aims to se...
research
02/21/2020

Towards Robust and Reproducible Active Learning Using Neural Networks

Active learning (AL) is a promising ML paradigm that has the potential t...
research
11/25/2021

Active Learning at the ImageNet Scale

Active learning (AL) algorithms aim to identify an optimal subset of dat...
research
04/08/2021

Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape

Deep learning (DL) relies on massive amounts of labeled data, and improv...
research
07/04/2022

Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Pool-based Active Learning (AL) has achieved great success in minimizing...
research
11/30/2020

On Initial Pools for Deep Active Learning

Active Learning (AL) techniques aim to minimize the training data requir...
research
09/11/2023

Stream-based Active Learning by Exploiting Temporal Properties in Perception with Temporal Predicted Loss

Active learning (AL) reduces the amount of labeled data needed to train ...

Please sign up or login with your details

Forgot password? Click here to reset