On Margin-Based Cluster Recovery with Oracle Queries

06/09/2021
by   Marco Bressan, et al.
0

We study an active cluster recovery problem where, given a set of n points and an oracle answering queries like "are these two points in the same cluster?", the task is to recover exactly all clusters using as few queries as possible. We begin by introducing a simple but general notion of margin between clusters that captures, as special cases, the margins used in previous work, the classic SVM margin, and standard notions of stability for center-based clusterings. Then, under our margin assumptions we design algorithms that, in a variety of settings, recover all clusters exactly using only O(log n) queries. For the Euclidean case, ℝ^m, we give an algorithm that recovers arbitrary convex clusters, in polynomial time, and with a number of queries that is lower than the best existing algorithm by Θ(m^m) factors. For general pseudometric spaces, where clusters might not be convex or might not have any notion of shape, we give an algorithm that achieves the O(log n) query bound, and is provably near-optimal as a function of the packing number of the space. Finally, for clusterings realized by binary concept classes, we give a combinatorial characterization of recoverability with O(log n) queries, and we show that, for many concept classes in Euclidean spaces, this characterization is equivalent to our margin condition. Our results show a deep connection between cluster margins and active cluster recoverability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2021

Exact Recovery of Clusters in Finite Metric Spaces Using Oracle Queries

We investigate the problem of exact cluster recovery using oracle querie...
research
06/08/2020

Exact Recovery of Mangled Clusters with Same-Cluster Queries

We study the problem of recovering distorted clusters in the semi-superv...
research
06/18/2021

Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle

Motivated by applications in crowdsourced entity resolution in database,...
research
03/02/2018

Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering

We study k-means clustering in a semi-supervised setting. Given an oracl...
research
06/09/2022

Clustering with Queries under Semi-Random Noise

The seminal paper by Mazumdar and Saha <cit.> introduced an extensive li...
research
09/11/2017

Semi-Supervised Active Clustering with Weak Oracles

Semi-supervised active clustering (SSAC) utilizes the knowledge of a dom...
research
02/19/2013

Breaking the Small Cluster Barrier of Graph Clustering

This paper investigates graph clustering in the planted cluster model in...

Please sign up or login with your details

Forgot password? Click here to reset