Data Stream Clustering: A Review

07/16/2020
by   Alaettin Zubaroğlu, et al.
0

Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-time data stream processing, because it can be applied with less prior information about the data and it does not need labeled instances. However, data stream clustering differs from traditional clustering in many aspects and it has several challenging issues. Here, we provide information regarding the concepts and common characteristics of data streams, such as concept drift, data structures for data streams, time window models and outlier detection. We comprehensively review recent data stream clustering algorithms and analyze them in terms of the base clustering technique, computational complexity and clustering accuracy. A comparison of these algorithms is given along with still open problems. We indicate popular data stream repositories and datasets, stream processing tools and platforms. Open problems about data stream clustering are also discussed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2023

SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams

Many real-world data stream applications not only suffer from concept dr...
research
07/26/2012

Achieving Approximate Soft Clustering in Data Streams

In recent years, data streaming has gained prominence due to advances in...
research
10/02/2017

Clustering Stream Data by Exploring the Evolution of Density Mountain

Stream clustering is a fundamental problem in many streaming data analys...
research
11/15/2019

Scalable and Reliable Multi-Dimensional Aggregation of Sensor Data Streams

Ever-increasing amounts of data and requirements to process them in real...
research
07/06/2020

Multi-tenant Pub/Sub Processing for Real-time Data Streams

Devices and sensors generate streams of data across a diversity of locat...
research
06/22/2021

A Clustering-based Framework for Classifying Data Streams

The non-stationary nature of data streams strongly challenges traditiona...
research
01/16/2018

Sequences, yet Functions: The Dual Nature of Data-Stream Processing

Data-stream processing has continuously risen in importance as the amoun...

Please sign up or login with your details

Forgot password? Click here to reset