Approximate Correlation Clustering Using Same-Cluster Queries

by   Nir Ailon, et al.

Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for clustering (SSAC) where a learner is allowed to make same-cluster queries. More specifically, in their model, there is a query oracle that answers queries of the form given any two vertices, do they belong to the same optimal cluster?. Ashtiani et al. showed the usefulness of such a query framework by giving a polynomial time algorithm for the k-means clustering problem where the input dataset satisfies some separation condition. Ailon et al. extended the above work to the approximation setting by giving an efficient (1+)-approximation algorithm for k-means for any small > 0 and any dataset within the SSAC framework. In this work, we extend this line of study to the correlation clustering problem. Correlation clustering is a graph clustering problem where pairwise similarity (or dissimilarity) information is given for every pair of vertices and the objective is to partition the vertices into clusters that minimise the disagreement (or maximises agreement) with the pairwise information given as input. These problems are popularly known as MinDisAgree and MaxAgree problems, and MinDisAgree[k] and MaxAgree[k] are versions of these problems where the number of optimal clusters is at most k. There exist Polynomial Time Approximation Schemes (PTAS) for MinDisAgree[k] and MaxAgree[k] where the approximation guarantee is (1+) for any small and the running time is polynomial in the input parameters but exponential in k and 1/. We obtain an (1+)-approximation algorithm for any small with running time that is polynomial in the input parameters and also in k and 1/. We also give non-trivial upper and lower bounds on the number of same-cluster queries, the lower bound being based on the Exponential Time Hypothesis (ETH).


page 1

page 2

page 3

page 4


Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

Several clustering frameworks with interactive (semi-supervised) queries...

Clustering with Same-Cluster Queries

We propose a framework for Semi-Supervised Active Clustering framework (...

Fuzzy Clustering with Similarity Queries

The fuzzy or soft k-means objective is a popular generalization of the w...

Query K-means Clustering and the Double Dixie Cup Problem

We consider the problem of approximate K-means clustering with outliers ...

Analysis of Ward's Method

We study Ward's method for the hierarchical k-means problem. This popula...

Semi-supervised clustering for de-duplication

Data de-duplication is the task of detecting multiple records that corre...

Optimal Clustering with Noisy Queries via Multi-Armed Bandit

Motivated by many applications, we study clustering with a faulty oracle...

Please sign up or login with your details

Forgot password? Click here to reset