Scalable and Effective Conductance-based Graph Clustering

11/22/2022
by   Longlong Lin, et al.
0

Conductance-based graph clustering has been recognized as a fundamental operator in numerous graph analysis applications. Despite the significant success of conductance-based graph clustering, existing algorithms are either hard to obtain satisfactory clustering qualities, or have high time and space complexity to achieve provable clustering qualities. To overcome these limitations, we devise a powerful peeling-based graph clustering framework PCon. We show that many existing solutions can be reduced to our framework. Namely, they first define a score function for each vertex, then iteratively remove the vertex with the smallest score. Finally, they output the result with the smallest conductance during the peeling process. Based on our framework, we propose two novel algorithms PCon_core and PCon_de with linear time and space complexity, which can efficiently and effectively identify clusters from massive graphs with more than a few billion edges. Surprisingly, we prove that PCon_de can identify clusters with near-constant approximation ratio, resulting in an important theoretical improvement over the well-known quadratic Cheeger bound. Empirical results on real-life and synthetic datasets show that our algorithms can achieve 5∼42 times speedup with a high clustering accuracy, while using 1.4∼7.8 times less memory than the baseline algorithms.

READ FULL TEXT
research
03/15/2022

Natural Hierarchical Cluster Analysis by Nearest Neighbors with Near-Linear Time Complexity

We propose a nearest neighbor based clustering algorithm that results in...
research
03/04/2019

Ultra-Scalable Spectral Clustering and Ensemble Clustering

This paper focuses on scalability and robustness of spectral clustering ...
research
12/09/2017

A Streaming Algorithm for Graph Clustering

We introduce a novel algorithm to perform graph clustering in the edge s...
research
11/15/2017

Cograph Editing in O(3^n n) time and O(2^n) space

We present a dynamic programming algorithm for optimally solving the Cog...
research
06/13/2020

SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets

This paper presents a batch-wise density-based clustering approach for l...
research
06/09/2021

Local Algorithms for Finding Densely Connected Clusters

Local graph clustering is an important algorithmic technique for analysi...
research
06/07/2021

Local Algorithms for Estimating Effective Resistance

Effective resistance is an important metric that measures the similarity...

Please sign up or login with your details

Forgot password? Click here to reset