A Novel Incremental Clustering Technique with Concept Drift Detection

03/30/2020
by   Mitchell D. Woodbright, et al.
0

Data are being collected from various aspects of life. These data can often arrive in chunks/batches. Traditional static clustering algorithms are not suitable for dynamic datasets, i.e., when data arrive in streams of chunks/batches. If we apply a conventional clustering technique over the combined dataset, then every time a new batch of data comes, the process can be slow and wasteful. Moreover, it can be challenging to store the combined dataset in memory due to its ever-increasing size. As a result, various incremental clustering techniques have been proposed. These techniques need to efficiently update the current clustering result whenever a new batch arrives, to adapt the current clustering result/solution with the latest data. These techniques also need the ability to detect concept drifts when the clustering pattern of a new batch is significantly different from older batches. Sometimes, clustering patterns may drift temporarily in a single batch while the next batches do not exhibit the drift. Therefore, incremental clustering techniques need the ability to detect a temporary drift and sustained drift. In this paper, we propose an efficient incremental clustering algorithm called UIClust. It is designed to cluster streams of data chunks, even when there are temporary or sustained concept drifts. We evaluate the performance of UIClust by comparing it with a recently published, high-quality incremental clustering algorithm. We use real and synthetic datasets. We compare the results by using well-known clustering evaluation criteria: entropy, sum of squared errors (SSE), and execution time. Our results show that UIClust outperforms the existing technique in all our experiments.

READ FULL TEXT
research
09/05/2022

Incremental Permutation Feature Importance (iPFI): Towards Online Explanations on Data Streams

Explainable Artificial Intelligence (XAI) has mainly focused on static l...
research
06/14/2021

Automated Machine Learning Techniques for Data Streams

Automated machine learning techniques benefited from tremendous research...
research
10/04/2021

DenDrift: A Drift-Aware Algorithm for Host Profiling

Detecting and reacting to unauthorized actions is an essential task in s...
research
06/24/2022

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Sequence clustering in a streaming environment is challenging because it...
research
01/04/2012

Clustering Dynamic Web Usage Data

Most classification methods are based on the assumption that data confor...
research
03/02/2023

iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams

Explainable Artificial Intelligence (XAI) focuses mainly on batch learni...
research
04/03/2022

A Computational Analysis of Pitch Drift in Unaccompanied Solo Singing using DBSCAN Clustering

Unaccompanied vocalists usually change the tuning unintentionally and en...

Please sign up or login with your details

Forgot password? Click here to reset