Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

07/04/2022
by   Xueying Zhan, et al.
0

Pool-based Active Learning (AL) has achieved great success in minimizing labeling cost by sequentially selecting informative unlabeled samples from a large unlabeled data pool and querying their labels from oracle/annotators. However, existing AL sampling strategies might not work well in out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains some data samples that do not belong to the classes of the target task. Achieving good AL performance under OOD data scenarios is a challenging task due to the natural conflict between AL sampling strategies and OOD sample detection. AL selects data that are hard to be classified by the current basic classifier (e.g., samples whose predicted class probabilities have high entropy), while OOD samples tend to have more uniform predicted class probabilities (i.e., high entropy) than in-distribution (ID) data. In this paper, we propose a sampling scheme, Monte-Carlo Pareto Optimization for Active Learning (POAL), which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool. We cast the AL sampling task as a multi-objective optimization problem, and thus we utilize Pareto optimization based on two conflicting objectives: (1) the normal AL data sampling scheme (e.g., maximum entropy), and (2) the confidence of not being an OOD sample. Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2023

Deep Active Learning with Contrastive Learning Under Realistic Data Pool Assumptions

Active learning aims to identify the most informative data from an unlab...
research
10/16/2020

ALdataset: a benchmark for pool-based active learning

Active learning (AL) is a subfield of machine learning (ML) in which a l...
research
07/11/2023

OpenAL: An Efficient Deep Active Learning Framework for Open-Set Pathology Image Classification

Active learning (AL) is an effective approach to select the most informa...
research
03/25/2022

A Comparative Survey of Deep Active Learning

Active Learning (AL) is a set of techniques for reducing labeling cost b...
research
10/23/2021

Confidence-Aware Active Feedback for Efficient Instance Search

Relevance feedback is widely used in instance search (INS) tasks to furt...
research
06/05/2021

Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach

Given restrictions on the availability of data, active learning is the p...
research
05/21/2023

On the Limitations of Simulating Active Learning

Active learning (AL) is a human-and-model-in-the-loop paradigm that iter...

Please sign up or login with your details

Forgot password? Click here to reset