Exact Distributed Stochastic Block Partitioning

05/30/2023
by   Frank Wanye, et al.
0

Stochastic block partitioning (SBP) is a community detection algorithm that is highly accurate even on graphs with a complex community structure, but its inherently serial nature hinders its widespread adoption by the wider scientific community. To make it practical to analyze large real-world graphs with SBP, there is a growing need to parallelize and distribute the algorithm. The current state-of-the-art distributed SBP algorithm is a divide-and-conquer approach that limits communication between compute nodes until the end of inference. This leads to the breaking of computational dependencies, which causes convergence issues as the number of compute nodes increases, and when the graph is sufficiently sparse. In this paper, we introduce EDiSt - an exact distributed stochastic block partitioning algorithm. Under EDiSt, compute nodes periodically share community assignments during inference. Due to this additional communication, EDiSt improves upon the divide-and-conquer algorithm by allowing it to scale out to a larger number of compute nodes without suffering from convergence issues, even on sparse graphs. We show that EDiSt provides speedups of up to 23.8X over the divide-and-conquer approach, and speedups up to 38.0X over shared memory parallel SBP when scaled out to 64 compute nodes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2021

Topology-Guided Sampling for Fast and Accurate Community Detection

Community detection is a well-studied problem with applications in domai...
research
04/23/2018

Eigenvector Computation and Community Detection in Asynchronous Gossip Models

We give a simple distributed algorithm for computing adjacency matrix ei...
research
09/11/2023

Graph Matching in Correlated Stochastic Block Models for Improved Graph Clustering

We consider community detection from multiple correlated graphs sharing ...
research
05/18/2015

Graph Partitioning via Parallel Submodular Approximation to Accelerate Distributed Machine Learning

Distributed computing excels at processing large scale data, but the com...
research
02/02/2016

Partial Recovery Bounds for the Sparse Stochastic Block Model

In this paper, we study the information-theoretic limits of community de...
research
10/04/2021

Clique percolation method: memory efficient almost exact communities

Automatic detection of relevant groups of nodes in large real-world grap...
research
08/25/2022

Adaptive Weights Community Detection

Due to the technological progress of the last decades, Community Detecti...

Please sign up or login with your details

Forgot password? Click here to reset