Robust Unsupervised Mining of Dense Sub-Graphs at Multiple Resolutions

09/16/2020
by   Sheshank Shankar, et al.
0

Whereas in traditional partitional clustering, each data point belongs to a cluster, there are several applications where only some of the points form relatively homogenous or “dense” groups, and points that don’t seem to belong to any cluster need to be ignored. Moreover, different clusters may emerge at different scales or density levels. This makes it difficult to identify them using a single density threshold, especially if we also want to ignore the non-clustering data. If data is represented in a metric space, then recent extensions of a classical approach called Hierarchical Mode Analysis (HMA) are able to identify clusters at multiple resolutions, while ignoring “non-dense” areas. However, this approach does not apply when the relations between pairs of data points can only be represented as a (sparse) similarity or affinity graph. Motivated by two complex, real-life applications where one needs to identify dense subgraphs at multiple resolutions, while ignoring nodes that are not well connected in the similarity graph, we introduce a novel algorithm called HIMAG (Hierarchical Incremental Mode Analysis for Graphs) that provides capabilities analogous to HMA based methods but applicable to graphs. We also provide a powerful multi-resolution visualization tool customized for the new algorithm. We present results on the two motivating real-world applications as well as two standard benchmark social graph datasets, to show the power of our approach and compare it with some standard graph partitioning algorithms that were retrofitted to produce dense clusters by pruning non-dense data in a non-trivial manner. We are also open-sourcing the new dense graph datasets and tools to the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2019

Hierarchical Clustering Supported by Reciprocal Nearest Neighbors

Clustering is a fundamental analysis tool aiming at classifying data poi...
research
12/29/2021

VDPC: Variational Density Peak Clustering Algorithm

The widely applied density peak clustering (DPC) algorithm makes an intu...
research
06/27/2019

Clustering by the way of atomic fission

Cluster analysis which focuses on the grouping and categorization of sim...
research
10/03/2018

Mining Contrasting Quasi-Clique Patterns

Mining dense quasi-cliques is a well-known clustering task with applicat...
research
10/05/2018

CDF Transform-Shift: An effective way to deal with inhomogeneous density datasets

Many distance-based algorithms exhibit bias towards dense clusters in in...
research
07/05/2014

Homophilic Clustering by Locally Asymmetric Geometry

Clustering is indispensable for data analysis in many scientific discipl...

Please sign up or login with your details

Forgot password? Click here to reset