Nearest-Neighbour-Induced Isolation Similarity and its Impact on Density-Based Clustering

06/30/2019
by   Xiaoyu Qin, et al.
0

A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on density-based clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2018

Hierarchical clustering that takes advantage of both density-peak and density-connectivity

This paper focuses on density-based clustering, particularly the Density...
research
05/15/2023

A Sweep-plane Algorithm for Calculating the Isolation of Mountains

One established metric to classify the significance of a mountain peak i...
research
10/12/2020

The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms

Agglomerative hierarchical clustering (AHC) is one of the popular cluste...
research
05/16/2017

Kernel clustering: density biases and solutions

Kernel methods are popular in clustering due to their generality and dis...
research
04/27/2020

Clustering via torque balance with mass and distance

Grouping similar objects is a fundamental tool of scientific analysis, u...
research
06/24/2019

Improving Stochastic Neighbour Embedding fundamentally with a well-defined data-dependent kernel

We identify a fundamental issue in the popular Stochastic Neighbour Embe...
research
11/29/2012

Overlapping clustering based on kernel similarity metric

Producing overlapping schemes is a major issue in clustering. Recent pro...

Please sign up or login with your details

Forgot password? Click here to reset