Robust Unsupervised Mining of Dense Sub-Graphs at Multiple Resolutions

by   Sheshank Shankar, et al.

Whereas in traditional partitional clustering, each data point belongs to a cluster, there are several applications where only some of the points form relatively homogenous or “dense” groups, and points that don’t seem to belong to any cluster need to be ignored. Moreover, different clusters may emerge at different scales or density levels. This makes it difficult to identify them using a single density threshold, especially if we also want to ignore the non-clustering data. If data is represented in a metric space, then recent extensions of a classical approach called Hierarchical Mode Analysis (HMA) are able to identify clusters at multiple resolutions, while ignoring “non-dense” areas. However, this approach does not apply when the relations between pairs of data points can only be represented as a (sparse) similarity or affinity graph. Motivated by two complex, real-life applications where one needs to identify dense subgraphs at multiple resolutions, while ignoring nodes that are not well connected in the similarity graph, we introduce a novel algorithm called HIMAG (Hierarchical Incremental Mode Analysis for Graphs) that provides capabilities analogous to HMA based methods but applicable to graphs. We also provide a powerful multi-resolution visualization tool customized for the new algorithm. We present results on the two motivating real-world applications as well as two standard benchmark social graph datasets, to show the power of our approach and compare it with some standard graph partitioning algorithms that were retrofitted to produce dense clusters by pruning non-dense data in a non-trivial manner. We are also open-sourcing the new dense graph datasets and tools to the community.


page 1

page 2

page 3

page 4


Hierarchical Clustering Supported by Reciprocal Nearest Neighbors

Clustering is a fundamental analysis tool aiming at classifying data poi...

Clustering by the way of atomic fission

Cluster analysis which focuses on the grouping and categorization of sim...

Mining Contrasting Quasi-Clique Patterns

Mining dense quasi-cliques is a well-known clustering task with applicat...

CDF Transform-Shift: An effective way to deal with inhomogeneous density datasets

Many distance-based algorithms exhibit bias towards dense clusters in in...

Homophilic Clustering by Locally Asymmetric Geometry

Clustering is indispensable for data analysis in many scientific discipl...

Faster DBSCAN via subsampled similarity queries

DBSCAN is a popular density-based clustering algorithm. It computes the ...

Geometric reconstructions of density based clusterings

DBSCAN* and HDBSCAN* are well established density based clustering algor...