Clustering based on Point-Set Kernel

02/14/2020
by   Kai Ming Ting, et al.
0

Measuring similarity between two objects is the core operation in existing cluster analyses in grouping similar objects into clusters. Cluster analyses have been applied to a number of applications, including image segmentation, social network analysis, and computational biology. This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a sample of objects generated from an unknown distribution. The proposed clustering procedure utilizes this new measure to characterize both the typical point of every cluster and the cluster grown from the typical point. We show that the new clustering procedure is both effective and efficient such that it can deal with large scale datasets. In contrast, existing clustering algorithms are either efficient or effective; and even efficient ones have difficulty dealing with large scale datasets without special hardware. We show that the proposed algorithm is more effective and runs orders of magnitude faster than the state-of-the-art density-peak clustering and scalable kernel k-means clustering when applying to datasets of millions of data points, on commonly used computing machines.

READ FULL TEXT

page 11

page 13

page 14

page 20

page 28

research
10/03/2018

Real-time Clustering Algorithm Based on Predefined Level-of-Similarity

This paper proposes a centroid-based clustering algorithm which is capab...
research
12/31/2019

Scalable Hierarchical Clustering with Tree Grafting

We introduce Grinch, a new algorithm for large-scale, non-greedy hierarc...
research
09/25/2013

A Unified Framework for Representation-based Subspace Clustering of Out-of-sample and Large-scale Data

Under the framework of spectral clustering, the key of subspace clusteri...
research
11/28/2017

A fatal point concept and a low-sensitivity quantitative measure for traffic safety analytics

The variability of the clusters generated by clustering techniques in th...
research
07/22/2021

Pre-Clustering Point Clouds of Crop Fields Using Scalable Methods

In order to apply the recent successes of automated plant phenotyping an...
research
01/21/2021

Fast Clustering of Short Text Streams Using Efficient Cluster Indexing and Dynamic Similarity Thresholds

Short text stream clustering is an important but challenging task since ...

Please sign up or login with your details

Forgot password? Click here to reset