Active learning for online training in imbalanced data streams under cold start

07/16/2021
by   Ricardo Barata, et al.
13

Labeled data is essential in modern systems that rely on Machine Learning (ML) for predictive modelling. Such systems may suffer from the cold-start problem: supervised models work well but, initially, there are no labels, which are costly or slow to obtain. This problem is even worse in imbalanced data scenarios. Online financial fraud detection is an example where labeling is: i) expensive, or ii) it suffers from long delays, if relying on victims filing complaints. The latter may not be viable if a model has to be in place immediately, so an option is to ask analysts to label events while minimizing the number of annotations to control costs. We propose an Active Learning (AL) annotation system for datasets with orders of magnitude of class imbalance, in a cold start streaming scenario. We present a computationally efficient Outlier-based Discriminative AL approach (ODAL) and design a novel 3-stage sequence of AL labeling policies where it is used as warm-up. Then, we perform empirical studies in four real world datasets, with various magnitudes of class imbalance. The results show that our method can more quickly reach a high performance model than standard AL policies. Its observed gains over random sampling can reach 80 annotation budget or additional historical data (with 1/10 to 1/50 of the labels).

READ FULL TEXT
research
01/25/2022

Cold Start Active Learning Strategies in the Context of Imbalanced Classification

We present novel active learning strategies dedicated to providing a sol...
research
04/26/2021

Unsupervised Instance Selection with Low-Label, Supervised Learning for Outlier Detection

The laborious process of labeling data often bottlenecks projects that a...
research
04/01/2015

A New Vision of Collaborative Active Learning

Active learning (AL) is a learning paradigm where an active learner has ...
research
10/03/2022

Nonstationary data stream classification with online active learning and siamese neural networks

We have witnessed in recent years an ever-growing volume of information ...
research
01/03/2019

Active Learning with TensorBoard Projector

An ML-based system for interactive labeling of image datasets is contrib...
research
05/24/2023

Active Learning for Natural Language Generation

The field of text generation suffers from a severe shortage of labeled d...
research
11/18/2019

Online Adaptive Asymmetric Active Learning with Limited Budgets

Online Active Learning (OAL) aims to manage unlabeled datastream by sele...

Please sign up or login with your details

Forgot password? Click here to reset