Re-Benchmarking Pool-Based Active Learning for Binary Classification

06/15/2023
by   Po-Yi Lu, et al.
0

Active learning is a paradigm that significantly enhances the performance of machine learning models when acquiring labeled data is expensive. While several benchmarks exist for evaluating active learning strategies, their findings exhibit some misalignment. This discrepancy motivates us to develop a transparent and reproducible benchmark for the community. Our efforts result in an open-sourced implementation (https://github.com/ariapoy/active-learning-benchmark) that is reliable and extensible for future research. By conducting thorough re-benchmarking experiments, we have not only rectified misconfigurations in existing benchmark but also shed light on the under-explored issue of model compatibility, which directly causes the observed discrepancy. Resolving the discrepancy reassures that the uncertainty sampling strategy of active learning remains an effective and preferred choice for most datasets. Our experience highlights the importance of dedicating research efforts towards re-benchmarking existing benchmarks to produce more credible results and gain deeper insights.

READ FULL TEXT

page 10

page 26

page 27

research
06/16/2023

LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning

Labeled data are critical to modern machine learning applications, but o...
research
12/02/2019

Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

We propose using active learning based techniques to further improve the...
research
01/23/2023

Speeding Up BatchBALD: A k-BALD Family of Approximations for Active Learning

Active learning is a powerful method for training machine learning model...
research
11/24/2022

PyTAIL: Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

Online data streams make training machine learning models hard because o...
research
06/08/2017

Nuclear Discrepancy for Active Learning

Active learning algorithms propose which unlabeled objects should be que...
research
03/10/2022

A Benchmark for Active Learning of Variability-Intensive Systems

Behavioral models are the key enablers for behavioral analysis of Softwa...
research
03/13/2020

Data-driven surrogate modelling and benchmarking for process equipment

A suite of computational fluid dynamics (CFD) simulations geared towards...

Please sign up or login with your details

Forgot password? Click here to reset