Robustness of spectral methods for community detection

11/14/2018 ∙ by Ludovic Stephan, et al. ∙ Irisa Inria 0

The present work is concerned with community detection. Specifically, we consider a random graph drawn according to the stochastic block model : its vertex set is partitioned into blocks, or communities, and edges are placed randomly and independently of each other with probability depending only on the communities of their two endpoints. In this context, our aim is to recover the community labels better than by random guess, based only on the observation of the graph. In the sparse case, where edge probabilities are in O(1/n), we introduce a new spectral method based on the distance matrix D^(l), where D^(l)_ij = 1 iff the graph distance between i and j, noted d(i, j) is equal to ℓ. We show that when ℓ∼ c(n) for carefully chosen c, the eigenvectors associated to the largest eigenvalues of D^(l) provide enough information to perform non-trivial community recovery with high probability, provided we are above the so-called Kesten-Stigum threshold. This yields an efficient algorithm for community detection, since computation of the matrix D^(l) can be done in O(n^1+κ) operations for a small constant κ. We then study the sensitivity of the eigendecomposition of D^(l) when we allow an adversarial perturbation of the edges of G. We show that when the considered perturbation does not affect more than O(n^ε) vertices for some small ε > 0, the highest eigenvalues and their corresponding eigenvectors incur negligible perturbations, which allows us to still perform efficient recovery.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background

Community detection is the task of finding large groups of similar items inside a large relationship graph, where it is expected that related items are (in the assortative case) more likely to be linked together. The Stochastic Block Model (abbreviated in SBM) has been designed by Holland et al. [Hol83] to analyze the performance of algorithms for this task ; is consists in a random graph whose edge probabilities depend only on the community membership of their endpoints. Since then, a large numbed of articles has been devoted to the study of this model ; a survey of these results can be found in Abbe [Abb17], or in Fortunato [For10] for a more general view on community detection.

The sparse case, when edge probabilities are in , is known to be much harder to study than denser models ; the existence of a positive portion of isolated vertices makes complete reconstruction impossible, and studies usually focus on partial recovery of the community structure. Insights on this topic often come from statistical physics ; in the two-community case, Decelle et al. conjectured in [Dec11] the existence of a threshold for reconstruction, which was then proved in Mossel et al. [Mos15] for the first part, Massoulié [Mas13] and Mossel et al. [Mos13] for the converse part. Similarly, in the general case, a method was first presented in Krzakala et al. [Krz13] and then proven to work in Bordenave et al. [Bor15] – bar a technical condition – and Abbe and Sandon [Abb16].

Notably, in the sparse setting, the usual method relying on the eigenvectors of the adjacency matrix of fails due to the lack of separation of the eigenvalues. Consequently, a wide array of alternative spectral methods have been designed, relying on the spectrum of a matrix associated to . More precisely, the eigenvectors associated to the highest eigenvalues will often carry some information about the community structure of , enough for partial reconstruction. Examples include the path expansion matrix used in [Mas13], or the non-backtracking matrix in [Krz13].

Additionally, other types of methods can be used in this setting : for example, the semi-definite programming (or SDP) algorithm relaxes the problem into a convex optimization one, which can then be approximately solved (see for example Montanari and Sen [Mon16]).

An important feature of real-life networks that is missing from the SBM is the existence of small-scale regions of higher density, that arise from phenomena unrelated to the community structure. For this reason, a common variant of the SBM is the addition of small cliques to the generated random graph. Commonly-used spectral methods, for example those relying on the non-backtracking matrix in [Bor15], are known to fail in this setting, due to the apparition of localized eigenvectors, with no ties to the community structure, and corresponding to large eigenvalues – see Zhang [Zha16]

for a comparison of those methods, as well as a proposed heuristic to deal with those localized vectors by lowering their associated eigenvalues. SDP methods are the most studied for this problem, due to their natural stability ; in particular, Makarychev et al. 

[Mak16] show a reconstruction algorithm that is robust to the adversarial addition of edges, in the case of an arbitrary number of communities ; this was also shown independently by Moitra et al. [Moi16]. However, all the SDP methods mentioned here fail to reach the KS threshold by at least a large constant, with only [Mon16] approaching it as the average degree increases.

1.2 Setting and main results

Stochastic block model

Let be a given integer, and a probability vector. We consider a random graph as follows. The vertex set is taken to be , and each vertex is assigned a type sampled independently from distribution .

Given a symmetric matrix with positive coefficients, two vertices in are joined with an edge randomly and independently with probability

Following [Bor15], we introduce and define the mean progeny matrix  ; the eigenvalues of are the same as those of and in particular are real. We denote them by

We shall make the following regularity assumptions : first,