Homophilic Clustering by Locally Asymmetric Geometry

07/05/2014
by   Deli Zhao, et al.
0

Clustering is indispensable for data analysis in many scientific disciplines. Detecting clusters from heavy noise remains challenging, particularly for high-dimensional sparse data. Based on graph-theoretic framework, the present paper proposes a novel algorithm to address this issue. The locally asymmetric geometries of neighborhoods between data points result in a directed similarity graph to model the structural connectivity of data points. Performing similarity propagation on this directed graph simply by its adjacency matrix powers leads to an interesting discovery, in the sense that if the in-degrees are ordered by the corresponding sorted out-degrees, they will be self-organized to be homophilic layers according to the different distributions of cluster densities, which is dubbed the Homophilic In-degree figure (the HI figure). With the HI figure, we can easily single out all cores of clusters, identify the boundary between cluster and noise, and visualize the intrinsic structures of clusters. Based on the in-degree homophily, we also develop a simple efficient algorithm of linear space complexity to cluster noisy data. Extensive experiments on toy and real-world scientific data validate the effectiveness of our algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2021

VDPC: Variational Density Peak Clustering Algorithm

The widely applied density peak clustering (DPC) algorithm makes an intu...
research
10/03/2018

Real-time Clustering Algorithm Based on Predefined Level-of-Similarity

This paper proposes a centroid-based clustering algorithm which is capab...
research
08/25/2012

Graph Degree Linkage: Agglomerative Clustering on a Directed Graph

This paper proposes a simple but effective graph-based agglomerative alg...
research
06/11/2020

Faster DBSCAN via subsampled similarity queries

DBSCAN is a popular density-based clustering algorithm. It computes the ...
research
09/16/2020

Robust Unsupervised Mining of Dense Sub-Graphs at Multiple Resolutions

Whereas in traditional partitional clustering, each data point belongs t...
research
02/16/2017

Reflexive Regular Equivalence for Bipartite Data

Bipartite data is common in data engineering and brings unique challenge...
research
06/27/2019

Curriculum Learning for Deep Generative Models with Clustering

Training generative models like generative adversarial networks (GANs) a...

Please sign up or login with your details

Forgot password? Click here to reset