Massively Parallel Correlation Clustering in Bounded Arboricity Graphs

02/23/2021
by   Mélanie Cambus, et al.
0

Identifying clusters of similar elements in a set is a common objective in data analysis. With the immense growth of data and physical limitations on single processor speed, it is necessary to find efficient parallel algorithms for clustering tasks. In this paper, we study the problem of correlation clustering in bounded arboricity graphs with respect to the Massively Parallel Computation (MPC) model. More specifically, we are given a complete graph where the vertices correspond to the elements and each edge is either positive or negative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the vertices into clusters with as few disagreements as possible. That is, we want to minimize the number of positive inter-cluster edges and negative intra-cluster edges. Consider an input graph G on n vertices such that the positive edges induce a λ-arboric graph. Our main result is a 3-approximation (in expectation) algorithm that runs in 𝒪(logλ·loglog n) MPC rounds in the sublinear memory regime. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA '18) on randomized greedy MIS and the algorithm of Ailon, Charikar, and Newman (STOC '05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with worst case (1+ϵ)-approximation guarantees in the special case of forests, where λ=1.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2022

A Parallel Algorithm for (3 + ε)-Approximate Correlation Clustering

Grouping together similar elements in datasets is a common task in data ...
research
06/28/2019

Min-Max Correlation Clustering via MultiCut

Correlation clustering is a fundamental combinatorial optimization probl...
research
08/14/2019

Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

Several clustering frameworks with interactive (semi-supervised) queries...
research
11/10/2020

Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem on Complete Signed Graphs

In order to study real-world systems, many applied works model them thro...
research
06/15/2021

Correlation Clustering in Constant Many Parallel Rounds

Correlation clustering is a central topic in unsupervised learning, with...
research
02/27/2019

Improved algorithms for Correlation Clustering with local objectives

Correlation Clustering is a powerful graph partitioning model that aims ...
research
11/03/2020

Regularized spectral methods for clustering signed networks

We study the problem of k-way clustering in signed graphs. Considerable ...

Please sign up or login with your details

Forgot password? Click here to reset