Breaking the Small Cluster Barrier of Graph Clustering

by   Nir Ailon, et al.

This paper investigates graph clustering in the planted cluster model in the presence of small clusters. Traditional results dictate that for an algorithm to provably correctly recover the clusters, all clusters must be sufficiently large (in particular, Ω̃(√(n)) where n is the number of nodes of the graph). We show that this is not really a restriction: by a more refined analysis of the trace-norm based recovery approach proposed in Jalali et al. (2011) and Chen et al. (2012), we prove that small clusters, under certain mild assumptions, do not hinder recovery of large ones. Based on this result, we further devise an iterative algorithm to recover almost all clusters via a "peeling strategy", i.e., recover large clusters first, leading to a reduced problem, and repeat this procedure. These results are extended to the partial observation setting, in which only a (chosen) part of the graph is observed.The peeling strategy gives rise to an active learning algorithm, in which edges adjacent to smaller clusters are queried more often as large clusters are learned (and removed). From a high level, this paper sheds novel insights on high-dimensional statistics and learning structured data, by presenting a structured matrix learning problem for which a one shot convex relaxation approach necessarily fails, but a carefully constructed sequence of convex relaxationsdoes the job.


page 1

page 2

page 3

page 4


On Distributed Listing of Cliques

We show an Õ(n^p/(p+2))-round algorithm in the model for listing of K_p ...

Recovery of a mixture of Gaussians by sum-of-norms clustering

Sum-of-norms clustering is a method for assigning n points in R^d to K c...

Exact Recovery of Mangled Clusters with Same-Cluster Queries

We study the problem of recovering distorted clusters in the semi-superv...

A mixed-integer linear programming approach for soft graph clustering

This paper proposes a Mixed-Integer Linear Programming approach for the ...

Clustering of Sparse and Approximately Sparse Graphs by Semidefinite Programming

As a model problem for clustering, we consider the densest k-disjoint-cl...

Clustering Without an Eigengap

We study graph clustering in the Stochastic Block Model (SBM) in the pre...

On Margin-Based Cluster Recovery with Oracle Queries

We study an active cluster recovery problem where, given a set of n poin...

Please sign up or login with your details

Forgot password? Click here to reset