Data augmentation on-the-fly and active learning in data stream classification

10/13/2022
by   Kleanthis Malialis, et al.
0

There is an emerging need for predictive models to be trained on-the-fly, since in numerous machine learning applications data are arriving in an online fashion. A critical challenge encountered is that of limited availability of ground truth information (e.g., labels in classification tasks) as new data are observed one-by-one online, while another significant challenge is that of class imbalance. This work introduces the novel Augmented Queues method, which addresses the dual-problem by combining in a synergistic manner online active learning, data augmentation, and a multi-queue memory to maintain separate and balanced queues for each class. We perform an extensive experimental study using image and time-series augmentations, in which we examine the roles of the active learning budget, memory size, imbalance level, and neural network type. We demonstrate two major advantages of Augmented Queues. First, it does not reserve additional memory space as the generation of synthetic data occurs only at training times. Second, learning models have access to more labelled data without the need to increase the active learning budget and / or the original memory size. Learning on-the-fly poses major challenges which, typically, hinder the deployment of learning models. Augmented Queues significantly improves the performance in terms of learning quality and speed. Our code is made publicly available.

READ FULL TEXT
research
10/03/2022

Nonstationary data stream classification with online active learning and siamese neural networks

We have witnessed in recent years an ever-growing volume of information ...
research
10/04/2020

Data-efficient Online Classification with Siamese Networks and Active Learning

An ever increasing volume of data is nowadays becoming available in a st...
research
02/07/2022

SODA: Self-organizing data augmentation in deep neural networks – Application to biomedical image segmentation tasks

In practice, data augmentation is assigned a predefined budget in terms ...
research
07/25/2022

Efficient Classification with Counterfactual Reasoning and Active Learning

Data augmentation is one of the most successful techniques to improve th...
research
01/29/2019

Limitations of Assessing Active Learning Performance at Runtime

Classification algorithms aim to predict an unknown label (e.g., a quali...
research
03/25/2020

VaB-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning

Active Learning for discriminative models has largely been studied with ...
research
09/24/2020

Online Learning With Adaptive Rebalancing in Nonstationary Environments

An enormous and ever-growing volume of data is nowadays becoming availab...

Please sign up or login with your details

Forgot password? Click here to reset