Local Graph Clustering Beyond Cheeger's Inequality

04/30/2013
by   Zeyuan Allen-Zhu, et al.
0

Motivated by applications of large-scale graph clustering, we study random-walk-based LOCAL algorithms whose running times depend only on the size of the output cluster, rather than the entire graph. All previously known such algorithms guarantee an output conductance of Õ(√(ϕ(A))) when the target set A has conductance ϕ(A)∈[0,1]. In this paper, we improve it to Õ( {√(ϕ(A)), ϕ(A)/√(Conn(A))}), where the internal connectivity parameter Conn(A) ∈ [0,1] is defined as the reciprocal of the mixing time of the random walk over the induced subgraph on A. For instance, using Conn(A) = Ω(λ(A) / n) where λ is the second eigenvalue of the Laplacian of the induced subgraph on A, our conductance guarantee can be as good as Õ(ϕ(A)/√(λ(A))). This builds an interesting connection to the recent advance of the so-called improved Cheeger's Inequality [KKL+13], which says that global spectral algorithms can provide a conductance guarantee of O(ϕ_opt/√(λ_3)) instead of O(√(ϕ_opt)). In addition, we provide theoretical guarantee on the clustering accuracy (in terms of precision and recall) of the output set. We also prove that our analysis is tight, and perform empirical evaluation to support our theory on both synthetic and real data. It is worth noting that, our analysis outperforms prior work when the cluster is well-connected. In fact, the better it is well-connected inside, the more significant improvement (both in terms of conductance and accuracy) we can obtain. Our results shed light on why in practice some random-walk-based algorithms perform better than its previous theory, and help guide future research about local clustering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2013

Flow-Based Algorithms for Local Graph Clustering

Given a subset S of vertices of an undirected graph G, the cut-improveme...
research
06/19/2017

Capacity Releasing Diffusion for Speed and Locality

Diffusions and related random walk procedures are of central importance ...
research
01/05/2018

Local Mixing Time: Distributed Computation and Applications

The mixing time of a graph is an important metric, which is not only use...
research
07/02/2011

A random walk on image patches

In this paper we address the problem of understanding the success of alg...
research
09/12/2013

Partitioning into Expanders

Let G=(V,E) be an undirected graph, lambda_k be the k-th smallest eigenv...
research
06/11/2019

Statistical guarantees for local graph clustering

Local graph clustering methods aim to find small clusters in very large ...
research
12/07/2022

DeMEtRIS: Counting (near)-Cliques by Crawling

We study the problem of approximately counting cliques and near cliques ...

Please sign up or login with your details

Forgot password? Click here to reset