Online Cluster Validity Indices for Streaming Data

01/08/2018
by   Masud Moshtaghi, et al.
0

Cluster analysis is used to explore structure in unlabeled data sets in a wide range of applications. An important part of cluster analysis is validating the quality of computationally obtained clusters. A large number of different internal indices have been developed for validation in the offline setting. However, this concept has not been extended to the online setting. A key challenge is to find an efficient incremental formulation of an index that can capture both cohesion and separation of the clusters over potentially infinite data streams. In this paper, we develop two online versions (with and without forgetting factors) of the Xie-Beni and Davies-Bouldin internal validity indices, and analyze their characteristics, using two streaming clustering algorithms (sk-means and online ellipsoidal clustering), and illustrate their use in monitoring evolving clusters in streaming data. We also show that incremental cluster validity indices are capable of sending a distress signal to online monitors when evolving clusters go awry. Our numerical examples indicate that the incremental Xie-Beni index with forgetting factor is superior to the other three indices tested.

READ FULL TEXT

page 3

page 7

page 8

page 9

research
05/11/2021

An internal validity index based on density-involved distance

It is crucial to evaluate the quality of clustering results in cluster a...
research
02/18/2019

Incremental Cluster Validity Indices for Hard Partitions: Extensions and Comparative Study

Validation is one of the most important aspects of clustering, but most ...
research
08/17/2021

Incremental cluster validity index-guided online learning for performance and robustness to presentation order

In streaming data applications incoming samples are processed and discar...
research
08/02/2022

Are Cluster Validity Measures (In)valid?

Internal cluster validity measures (such as the Calinski-Harabasz, Dunn,...
research
06/09/2017

Towards balanced clustering - part 1 (preliminaries)

The article contains a preliminary glance at balanced clustering problem...
research
11/11/2020

A Survey and Implementation of Performance Metrics for Self-Organized Maps

Self-Organizing Map algorithms have been used for almost 40 years across...

Please sign up or login with your details

Forgot password? Click here to reset