On Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering

09/21/2016
by   Kajsa Møllersen, et al.
0

Hybrid clustering combines partitional and hierarchical clustering for computational effectiveness and versatility in cluster shape. In such clustering, a dissimilarity measure plays a crucial role in the hierarchical merging. The dissimilarity measure has great impact on the final clustering, and data-independent properties are needed to choose the right dissimilarity measure for the problem at hand. Properties for distance-based dissimilarity measures have been studied for decades, but properties for density-based dissimilarity measures have so far received little attention. Here, we propose six data-independent properties to evaluate density-based dissimilarity measures associated with hybrid clustering, regarding equality, orthogonality, symmetry, outlier and noise observations, and light-tailed models for heavy-tailed clusters. The significance of the properties is investigated, and we study some well-known dissimilarity measures based on Shannon entropy, misclassification rate, Bhattacharyya distance and Kullback-Leibler divergence with respect to the proposed properties. As none of them satisfy all the proposed properties, we introduce a new dissimilarity measure based on the Kullback-Leibler information and show that it satisfies all proposed properties. The effect of the proposed properties is also illustrated on several real and simulated data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2010

Stability of Density-Based Clustering

High density clusters can be characterized by the connected components o...
research
04/15/2016

Delta divergence: A novel decision cognizant measure of classifier incongruence

Disagreement between two classifiers regarding the class membership of a...
research
08/01/2023

Mathematical Foundations of Data Cohesion

Data cohesion, a recently introduced measure inspired by social interact...
research
02/16/2013

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying da...
research
08/22/2021

The Exploitation of Distance Distributions for Clustering

Although distance measures are used in many machine learning algorithms,...
research
08/15/2015

Towards an Axiomatic Approach to Hierarchical Clustering of Measures

We propose some axioms for hierarchical clustering of probability measur...
research
12/14/2017

Rate of Change Analysis for Interestingness Measures

The use of Association Rule Mining techniques in diverse contexts and do...

Please sign up or login with your details

Forgot password? Click here to reset