Massively Parallel Algorithms for the Stochastic Block Model

07/02/2023
by   Zelin Li, et al.
0

Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science and statistics. We study the problem of exactly recovering the communities in a graph generated from the Stochastic Block Model (SBM) in the Massively Parallel Computation (MPC) model. Specifically, given kn vertices that are partitioned into k equal-sized clusters (i.e., each has size n), a graph on these kn vertices is randomly generated such that each pair of vertices is connected with probability p if they are in the same cluster and with probability q if not, where p > q > 0. We give MPC algorithms for the SBM in the (very general) s-space MPC model, where each machine has memory s=Ω(log n). Under the condition that p-q/√(p)≥Ω̃(k^1/2n^-1/2+1/2(r-1)) for any integer r∈ [3,O(log n)], our first algorithm exactly recovers all the k clusters in O(krlog_s n) rounds using Õ(m) total space, or in O(rlog_s n) rounds using Õ(km) total space. If p-q/√(p)≥Ω̃(k^3/4n^-1/4), our second algorithm achieves O(log_s n) rounds and Õ(m) total space complexity. Both algorithms significantly improve upon a recent result of Cohen-Addad et al. [PODC'22], who gave algorithms that only work in the sublinear space MPC model, where each machine has local memory s=O(n^δ) for some constant δ>0, with a much stronger condition on p,q,k. Our algorithms are based on collecting the r-step neighborhood of each vertex and comparing the difference of some statistical information generated from the local neighborhoods for each pair of vertices. To implement the clustering algorithms in parallel, we present efficient approaches for implementing some basic graph operations in the s-space MPC model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2020

Sample-and-Gather: Fast Ruling Set Algorithms in the Low-Memory MPC Model

Motivated by recent progress on symmetry breaking problems such as maxim...
research
02/19/2020

Parallel Algorithms for Small Subgraph Counting

Subgraph counting is a fundamental problem in analyzing massive graphs, ...
research
08/09/2021

Deterministic Massively Parallel Connectivity

We consider the problem of designing fundamental graph algorithms on the...
research
06/04/2021

Massively Parallel and Dynamic Algorithms for Minimum Size Clustering

In this paper, we study the r-gather problem, a natural formulation of m...
research
05/02/2019

Log Diameter Rounds Algorithms for 2-Vertex and 2-Edge Connectivity

Many modern parallel systems, such as MapReduce, Hadoop and Spark, can b...
research
04/13/2020

Exact recovery and sharp thresholds of Stochastic Ising Block Model

The stochastic block model (SBM) is a random graph model in which the ed...
research
07/15/2023

Fully Scalable MPC Algorithms for Clustering in High Dimension

We design new algorithms for k-clustering in high-dimensional Euclidean ...

Please sign up or login with your details

Forgot password? Click here to reset