Mining Drifting Data Streams on a Budget: Combining Active Learning with Self-Labeling

12/21/2021
by   Łukasz Korycki, et al.
0

Mining data streams poses a number of challenges, including the continuous and non-stationary nature of data, the massive volume of information to be processed and constraints put on the computational resources. While there is a number of supervised solutions proposed for this problem in the literature, most of them assume that access to the ground truth (in form of class labels) is unlimited and such information can be instantly utilized when updating the learning system. This is far from being realistic, as one must consider the underlying cost of acquiring labels. Therefore, solutions that can reduce the requirements for ground truth in streaming scenarios are required. In this paper, we propose a novel framework for mining drifting data streams on a budget, by combining information coming from active learning and self-labeling. We introduce several strategies that can take advantage of both intelligent instance selection and semi-supervised procedures, while taking into account the potential presence of concept drift. Such a hybrid approach allows for efficient exploration and exploitation of streaming data structures within realistic labeling budgets. Since our framework works as a wrapper, it may be applied with different learning algorithms. Experimental study, carried out on a diverse set of real-world data streams with various types of concept drift, proves the usefulness of the proposed strategies when dealing with highly limited access to class labels. The presented hybrid approach is especially feasible when one cannot increase a budget for labeling or replace an inefficient classifier. We deliver a set of recommendations regarding areas of applicability for our strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2023

Combining self-labeling and demand based active learning for non-stationary data streams

Learning from non-stationary data streams is a research direction that g...
research
09/20/2020

Instance exploitation for learning temporary concepts from sparsely labeled drifting data streams

Continual learning from streaming data sources becomes more and more pop...
research
04/14/2022

Stream-based Active Learning with Verification Latency in Non-stationary Environments

Data stream classification is an important problem in the field of machi...
research
05/14/2014

Active Mining of Parallel Video Streams

The practicality of a video surveillance system is adversely limited by ...
research
12/19/2021

Active Weighted Aging Ensemble for Drifted Data Stream Classification

One of the significant problems of streaming data classification is the ...
research
04/20/2018

Streaming Active Learning Strategies for Real-Life Credit Card Fraud Detection: Assessment and Visualization

Credit card fraud detection is a very challenging problem because of the...
research
12/30/2022

Learning from Data Streams: An Overview and Update

The literature on machine learning in the context of data streams is vas...

Please sign up or login with your details

Forgot password? Click here to reset