SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams

08/28/2023
by   Chun Wai Chiu, et al.
0

Many real-world data stream applications not only suffer from concept drift but also class imbalance. Yet, very few existing studies investigated this joint challenge. Data difficulty factors, which have been shown to be key challenges in class imbalanced data streams, are not taken into account by existing approaches when learning class imbalanced data streams. In this work, we propose a drift adaptable oversampling strategy to synthesise minority class examples based on stream clustering. The motivation is that stream clustering methods continuously update themselves to reflect the characteristics of the current underlying concept, including data difficulty factors. This nature can potentially be used to compress past information without caching data in the memory explicitly. Based on the compressed information, synthetic examples can be created within the region that recently generated new minority class examples. Experiments with artificial and real-world data streams show that the proposed approach can handle concept drift involving different minority class decomposition better than existing approaches, especially when the data stream is severely class imbalanced and presenting high proportions of safe and borderline minority class examples.

READ FULL TEXT

page 15

page 16

page 30

page 32

page 35

page 36

page 39

page 40

research
10/15/2022

The Influence of Multiple Classes on Learning Online Classifiers from Imbalanced and Concept Drifting Data Streams

This work is aimed at the experimental studying the influence of local d...
research
07/16/2020

Data Stream Clustering: A Review

Number of connected devices is steadily increasing and these devices con...
research
12/30/2022

Learning from Data Streams: An Overview and Update

The literature on machine learning in the context of data streams is vas...
research
12/29/2020

Drift-Aware Multi-Memory Model for Imbalanced Data Streams

Online class imbalance learning deals with data streams that are affecte...
research
02/04/2022

Stop Oversampling for Class Imbalance Learning: A Critical Review

For the last two decades, oversampling has been employed to overcome the...
research
04/24/2017

Learning from Ontology Streams with Semantic Concept Drift

Data stream learning has been largely studied for extracting knowledge s...
research
09/27/2018

Queue-based Resampling for Online Class Imbalance Learning

Online class imbalance learning constitutes a new problem and an emergin...

Please sign up or login with your details

Forgot password? Click here to reset