A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

04/07/2022
by   Gabriel Aguiar, et al.
0

Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures on how to evaluate these algorithms. This work presents a taxonomy of algorithms for imbalanced data streams and proposes a standardized, exhaustive, and informative experimental testbed to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to the largest experimental study conducted so far in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental testbed is fully reproducible and easy to extend with new methods. This way we propose the first standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create trustworthy and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2022

The Influence of Multiple Classes on Learning Online Classifiers from Imbalanced and Concept Drifting Data Streams

This work is aimed at the experimental studying the influence of local d...
research
07/24/2021

Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future Directions

Learning from imbalanced data is among the most challenging areas in con...
research
01/30/2021

Hellinger Distance Weighted Ensemble for Imbalanced Data Stream Classification

The imbalanced data classification remains a vital problem. The key is t...
research
08/29/2023

OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

How to get insights from relational data streams in a timely manner is a...
research
09/17/2023

Imbalanced Data Stream Classification using Dynamic Ensemble Selection

Modern streaming data categorization faces significant challenges from c...
research
04/09/2023

Class-Imbalanced Learning on Graphs: A Survey

The rapid advancement in data-driven research has increased the demand f...
research
09/15/2021

On-the-Fly Ensemble Pruning in Evolving Data Streams

Ensemble pruning is the process of selecting a subset of componentclassi...

Please sign up or login with your details

Forgot password? Click here to reset