The Influence of Multiple Classes on Learning Online Classifiers from Imbalanced and Concept Drifting Data Streams

10/15/2022
by   Agnieszka Lipska, et al.
0

This work is aimed at the experimental studying the influence of local data characteristics and drifts on the difficulties of learning various online classifiers from multi-class imbalanced data streams. Firstly we present a categorization of these data factors and drifts in the context of imbalanced streams, then we introduce the generators of synthetic streams that model these factors and drifts. The results of many experiments with synthetically generated data streams have shown a much greater role of the overlapping between many minority classes (the type of borderline examples) than for streams with one minority class. The presence of rare examples in the stream is the most difficult single factor. The local drift of splitting minority classes is the third influential factor. Unlike binary streams, the specialized UOB and OOB classifiers perform well enough for even high imbalance ratios. The most challenging for all classifiers are complex scenarios integrating the drifts of the identified factors simultaneously, which worsen the evaluation measures in the case of a several minority classes stronger than for binary ones. This is an extended version of the short paper presented at LIDTA'2022 workshop at ECMLPKDD2022.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2021

Concept Drift Detection from Multi-Class Imbalanced Data Streams

Continual learning from data streams is among the most important topics ...
research
04/07/2022

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Class imbalance poses new challenges when it comes to classifying data s...
research
08/28/2023

SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams

Many real-world data stream applications not only suffer from concept dr...
research
09/17/2023

Imbalanced Data Stream Classification using Dynamic Ensemble Selection

Modern streaming data categorization faces significant challenges from c...
research
05/09/2014

Hellinger Distance Trees for Imbalanced Streams

Classifiers trained on data sets possessing an imbalanced class distribu...
research
07/09/2019

Contextual One-Class Classification in Data Streams

In machine learning, the one-class classification problem occurs when tr...
research
10/04/2022

Sampling Streaming Data with Parallel Vector Quantization – PVQ

Accumulation of corporate data in the cloud has attracted more enterpris...

Please sign up or login with your details

Forgot password? Click here to reset