Clustering with Queries under Semi-Random Noise

06/09/2022
by   Alberto Del Pia, et al.
0

The seminal paper by Mazumdar and Saha <cit.> introduced an extensive line of work on clustering with noisy queries. Yet, despite significant progress on the problem, the proposed methods depend crucially on knowing the exact probabilities of errors of the underlying fully-random oracle. In this work, we develop robust learning methods that tolerate general semi-random noise obtaining qualitatively the same guarantees as the best possible methods in the fully-random model. More specifically, given a set of n points with an unknown underlying partition, we are allowed to query pairs of points u,v to check if they are in the same cluster, but with probability p, the answer may be adversarially chosen. We show that information theoretically O(nk log n/(1-2p)^2) queries suffice to learn any cluster of sufficiently large size. Our main result is a computationally efficient algorithm that can identify large clusters with O(nk log n/(1-2p)^2) + poly(log n, k, 1/1-2p) queries, matching the guarantees of the best known algorithms in the fully-random model. As a corollary of our approach, we develop the first parameter-free algorithm for the fully-random model, answering an open question by <cit.>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2021

Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle

Motivated by applications in crowdsourced entity resolution in database,...
research
10/13/2022

Efficient Algorithms for Obnoxious Facility Location on a Line Segment or Circle

We study different restricted variations of the obnoxious facility locat...
research
07/12/2022

Optimal Clustering with Noisy Queries via Multi-Armed Bandit

Motivated by many applications, we study clustering with a faulty oracle...
research
06/09/2021

On Margin-Based Cluster Recovery with Oracle Queries

We study an active cluster recovery problem where, given a set of n poin...
research
06/22/2017

Clustering with Noisy Queries

In this paper, we initiate a rigorous theoretical study of clustering wi...
research
09/21/2019

Optimal Learning of Joint Alignments with a Faulty Oracle

We consider the following problem, which is useful in applications such ...
research
08/14/2022

Resolution Guarantees for the Reconstruction of Inclusions in Linear Elasticity Based on Monotonicity Methods

We deal with the reconstruction of inclusions in elastic bodies based on...

Please sign up or login with your details

Forgot password? Click here to reset