Exact Recovery of Mangled Clusters with Same-Cluster Queries

06/08/2020
by   Marco Bressan, et al.
0

We study the problem of recovering distorted clusters in the semi-supervised active clustering framework. Given an oracle revealing whether any two points lie in the same cluster, we are interested in designing algorithms that recover all clusters exactly, in polynomial time, and using as few queries as possible. Towards this end, we extend the notion of center-based clustering with margin introduced by Ashtiani et al. to clusters with arbitrary linear distortions and arbitrary centers. This includes all those cases where the original dataset is transformed by any combination of rotations, axis scalings, and point deletions. We show that, even in this significantly more challenging setting, it is possible to recover the underlying clustering exactly while using only a small number of oracle queries. To this end we design an algorithm that, given n points to be partitioned into k clusters, uses O(k^3 ln k ln n) oracle queries and Õ(kn + k^3) time to recover the exact clustering structure of the underlying instance (even when the instance is NP-hard to solve without oracle access). The O(·) notation hides an exponential dependence on the dimensionality of the clusters, which we show to be necessary. Our algorithm is simple, easy to implement, and can also learn the clusters using low-stretch separators, a class of ellipsoids with additional theoretical guarantees. Experiments on large synthetic datasets confirm that we can reconstruct the latent clustering exactly and efficiently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2021

On Margin-Based Cluster Recovery with Oracle Queries

We study an active cluster recovery problem where, given a set of n poin...
research
01/31/2021

Exact Recovery of Clusters in Finite Metric Spaces Using Oracle Queries

We investigate the problem of exact cluster recovery using oracle querie...
research
10/10/2018

Semi-supervised clustering for de-duplication

Data de-duplication is the task of detecting multiple records that corre...
research
08/17/2021

Learning to Cluster via Same-Cluster Queries

We study the problem of learning to cluster data points using an oracle ...
research
08/09/2014

Efficient Clustering with Limited Distance Information

Given a point set S and an unknown metric d on S, we study the problem o...
research
02/19/2013

Breaking the Small Cluster Barrier of Graph Clustering

This paper investigates graph clustering in the planted cluster model in...
research
03/02/2018

Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering

We study k-means clustering in a semi-supervised setting. Given an oracl...

Please sign up or login with your details

Forgot password? Click here to reset