Query-Efficient Correlation Clustering

02/26/2020
by   David Garcia Soriano, et al.
0

Correlation clustering is arguably the most natural formulation of clustering. Given n objects and a pairwise similarity measure, the goal is to cluster the objects so that, to the best possible extent, similar objects are put in the same cluster and dissimilar objects are put in different clusters. A main drawback of correlation clustering is that it requires as input the Θ(n^2) pairwise similarities. This is often infeasible to compute or even just to store. In this paper we study query-efficient algorithms for correlation clustering. Specifically, we devise a correlation clustering algorithm that, given a budget of Q queries, attains a solution whose expected number of disagreements is at most 3· OPT + O(n^3/Q), where OPT is the optimal cost for the instance. Its running time is O(Q), and can be easily made non-adaptive (meaning it can specify all its queries at the outset and make them in parallel) with the same guarantees. Up to constant factors, our algorithm yields a provably optimal trade-off between the number of queries Q and the worst-case error attained, even for adaptive algorithms. Finally, we perform an experimental study of our proposed method on both synthetic and real data, showing the scalability and the accuracy of our algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

Correlation Clustering with Adaptive Similarity Queries

We investigate learning algorithms that use similarity queries to approx...
research
08/14/2019

Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

Several clustering frameworks with interactive (semi-supervised) queries...
research
02/20/2023

Active Learning with Positive and Negative Pairwise Feedback

In this paper, we propose a generic framework for active clustering with...
research
06/18/2021

Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle

Motivated by applications in crowdsourced entity resolution in database,...
research
05/07/2022

Almost 3-Approximate Correlation Clustering in Constant Rounds

We study parallel algorithms for correlation clustering. Each pair among...
research
10/08/2020

Near-Optimal Comparison Based Clustering

The goal of clustering is to group similar objects into meaningful parti...
research
03/02/2022

Near-Optimal Correlation Clustering with Privacy

Correlation clustering is a central problem in unsupervised learning, wi...

Please sign up or login with your details

Forgot password? Click here to reset