Robust Unsupervised Mining of Dense Sub-Graphs at Multiple Resolutions

09/16/2020
by   Sheshank Shankar, et al.
0

Whereas in traditional partitional clustering, each data point belongs to a cluster, there are several applications where only some of the points form relatively homogenous or “dense” groups, and points that don’t seem to belong to any cluster need to be ignored. Moreover, different clusters may emerge at different scales or density levels. This makes it difficult to identify them using a single density threshold, especially if we also want to ignore the non-clustering data. If data is represented in a metric space, then recent extensions of a classical approach called Hierarchical Mode Analysis (HMA) are able to identify clusters at multiple resolutions, while ignoring “non-dense” areas. However, this approach does not apply when the relations between pairs of data points can only be represented as a (sparse) similarity or affinity graph. Motivated by two complex, real-life applications where one needs to identify dense subgraphs at multiple resolutions, while ignoring nodes that are not well connected in the similarity graph, we introduce a novel algorithm called HIMAG (Hierarchical Incremental Mode Analysis for Graphs) that provides capabilities analogous to HMA based methods but applicable to graphs. We also provide a powerful multi-resolution visualization tool customized for the new algorithm. We present results on the two motivating real-world applications as well as two standard benchmark social graph datasets, to show the power of our approach and compare it with some standard graph partitioning algorithms that were retrofitted to produce dense clusters by pruning non-dense data in a non-trivial manner. We are also open-sourcing the new dense graph datasets and tools to the community.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset