Spectral Recovery of Binary Censored Block Models
Community detection is the problem of identifying community structure in graphs. Often the graph is modeled as a sample from the Stochastic Block Model, in which each vertex belongs to a community. The probability that two vertices are connected by an edge depends on the communities of those vertices. In this paper, we consider a model of censored community detection with two communities, where most of the data is missing as the status of only a small fraction of the potential edges is revealed. In this model, vertices in the same community are connected with probability p while vertices in opposite communities are connected with probability q. The connectivity status of a given pair of vertices {u,v} is revealed with probability α, independently across all pairs, where α = t log(n)/n. We establish the information-theoretic threshold t_c(p,q), such that no algorithm succeeds in recovering the communities exactly when t < t_c(p,q). We show that when t > t_c(p,q), a simple spectral algorithm based on a weighted, signed adjacency matrix succeeds in recovering the communities exactly. While spectral algorithms are shown to have near-optimal performance in the symmetric case, we show that they may fail in the asymmetric case where the connection probabilities inside the two communities are allowed to be different. In particular, we show the existence of a parameter regime where a simple two-phase algorithm succeeds but any algorithm based on thresholding a linear combination of the top two eigenvectors of the weighted, signed adjacency matrix fails.
READ FULL TEXT