Scalable Community Detection via Parallel Correlation Clustering

07/27/2021
by   Jessica Shi, et al.
0

Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LambdaCC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. For example, on a 30-core machine with two-way hyper-threading, our implementations achieve orders of magnitude speedups over other correlation clustering baselines, and up to 28.44x speedups over our own sequential baselines while maintaining or improving quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2021

Incremental Community Detection in Distributed Dynamic Graph

Community detection is an important research topic in graph analytics th...
research
12/31/2019

Scalable Hierarchical Clustering with Tree Grafting

We introduce Grinch, a new algorithm for large-scale, non-greedy hierarc...
research
10/17/2022

Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection

The task of community detection, which aims to partition a network into ...
research
03/07/2022

Fast Community Detection based on Graph Autoencoder Reconstruction

With the rapid development of big data, how to efficiently and accuratel...
research
06/08/2021

ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain

This paper studies the hierarchical clustering problem, where the goal i...
research
02/07/2021

Effective and Scalable Clustering on Massive Attributed Graphs

Given a graph G where each node is associated with a set of attributes, ...
research
05/28/2018

Parallel Louvain Community Detection Optimized for GPUs

Community detection now is an important operation in numerous graph base...

Please sign up or login with your details

Forgot password? Click here to reset