Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

08/14/2019
by   Barna Saha, et al.
0

Several clustering frameworks with interactive (semi-supervised) queries have been studied in the past. Recently, clustering with same-cluster queries has become popular. An algorithm in this setting has access to an oracle with full knowledge of an optimal clustering, and the algorithm can ask the oracle queries of the form, "Does the optimal clustering put vertices u and v in the same cluster?" Due to its simplicity, this querying model can easily be implemented in real crowd-sourcing platforms and has attracted a lot of recent work. In this paper, we study the popular correlation clustering problem (Bansal et al., 2002) under this framework. Given a complete graph G=(V,E) with positive and negative edge labels, correlation clustering objective aims to compute a graph clustering that minimizes the total number of disagreements, that is the negative intra-cluster edges and positive inter-cluster edges. Let C_OPT be the number of disagreements made by the optimal clustering. We present algorithms for correlation clustering whose error and query bounds are parameterized by C_OPT rather than by the number of clusters. Indeed, a good clustering must have small C_OPT. Specifically, we present an efficient algorithm that recovers an exact optimal clustering using at most 2C_OPT queries and an efficient algorithm that outputs a 2-approximation using at most C_OPT queries. In addition, we show under a plausible complexity assumption, there does not exist any polynomial time algorithm that has an approximation ratio better than 1+α for an absolute constant α >0 with o(C_OPT) queries. We extensively evaluate our methods on several synthetic and real-world datasets using real crowd-sourced oracles. Moreover, we compare our approach against several known correlation clustering algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2017

Approximate Correlation Clustering Using Same-Cluster Queries

Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for c...
research
10/10/2018

Semi-supervised clustering for de-duplication

Data de-duplication is the task of detecting multiple records that corre...
research
05/04/2020

Learning Strong Substitutes Demand via Queries

This paper addresses the computational challenges of learning strong sub...
research
02/26/2020

Query-Efficient Correlation Clustering

Correlation clustering is arguably the most natural formulation of clust...
research
07/31/2013

Who and Where: People and Location Co-Clustering

In this paper, we consider the clustering problem on images where each i...
research
02/23/2021

Massively Parallel Correlation Clustering in Bounded Arboricity Graphs

Identifying clusters of similar elements in a set is a common objective ...
research
06/04/2021

Fuzzy Clustering with Similarity Queries

The fuzzy or soft k-means objective is a popular generalization of the w...

Please sign up or login with your details

Forgot password? Click here to reset