Distributed Community Detection for Large Scale Networks Using Stochastic Block Model

09/24/2020
by   Shihao Wu, et al.
0

With rapid developments of information and technology, large scale network data are ubiquitous. In this work we develop a distributed spectral clustering algorithm for community detection in large scale networks. To handle the problem, we distribute l pilot network nodes on the master server and the others on worker servers. A spectral clustering algorithm is first conducted on the master to select pseudo centers. The indexes of the pseudo centers are then broadcasted to workers to complete distributed community detection task using a SVD type algorithm. The proposed distributed algorithm has three merits. First, the communication cost is low since only the indexes of pseudo centers are communicated. Second, no further iteration algorithm is needed on workers and hence it does not suffer from problems as initialization and non-robustness. Third, both the computational complexity and the storage requirements are much lower compared to using the whole adjacency matrix. A Python package DCD (www.github.com/Ikerlz/dcd) is developed to implement the distributed algorithm for a Spark system. Theoretical properties are provided with respect to the estimation accuracy and mis-clustering rates. Lastly, the advantages of the proposed methodology are illustrated by experiments on a variety of synthetic and empirical datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2020

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

Spectral clustering has been one of the widely used methods for communit...
research
04/18/2018

Entropic Spectral Learning in Large Scale Networks

We present a novel algorithm for learning the spectral density of large ...
research
10/19/2021

Subsampling Spectral Clustering for Large-Scale Social Networks

Online social network platforms such as Twitter and Sina Weibo have been...
research
04/25/2020

Randomized spectral co-clustering for large-scale directed networks

Directed networks are generally used to represent asymmetric relationshi...
research
12/08/2022

A Distributed Block Chebyshev-Davidson Algorithm for Parallel Spectral Clustering

We develop a distributed Block Chebyshev-Davidson algorithm to solve lar...
research
11/20/2014

Clustering evolving data using kernel-based methods

In this thesis, we propose several modelling strategies to tackle evolvi...
research
05/05/2019

Fast communication-efficient spectral clustering over distributed data

The last decades have seen a surge of interests in distributed computing...

Please sign up or login with your details

Forgot password? Click here to reset