How to Design Robust Algorithms using Noisy Comparison Oracle

05/12/2021
by   Raghavendra Addanki, et al.
4

Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as k-center clustering and agglomerative hierarchical clustering. These techniques crucially rely on accurate estimation of pairwise distance between records. However, computing exact features of the records, and their pairwise distances is often challenging, and sometimes not possible. We circumvent this challenge by leveraging weak supervision in the form of a comparison oracle that compares the relative distance between the queried points such as `Is point u closer to v or w closer to x?'. However, it is possible that some queries are easier to answer than others using a comparison oracle. We capture this by introducing two different noise models called adversarial and probabilistic noise. In this paper, we study various problems that include finding maximum, nearest/farthest neighbor search under these noise models. Building upon the techniques we develop for these comparison operations, we give robust algorithms for k-center clustering and agglomerative hierarchical clustering. We prove that our algorithms achieve good approximation guarantees with a high probability and analyze their query complexity. We evaluate the effectiveness and efficiency of our techniques empirically on various real-world datasets.

READ FULL TEXT
research
09/11/2017

Semi-Supervised Active Clustering with Weak Oracles

Semi-supervised active clustering (SSAC) utilizes the knowledge of a dom...
research
04/05/2017

Comparison Based Nearest Neighbor Search

We consider machine learning in a comparison-based setting where we are ...
research
11/03/2020

Greedy k-Center from Noisy Distance Samples

We study a variant of the canonical k-center problem over a set of verti...
research
02/20/2018

Comparison Based Learning from Weak Oracles

There is increasing interest in learning algorithms that involve interac...
research
11/20/2017

Relaxed Oracles for Semi-Supervised Clustering

Pairwise "same-cluster" queries are one of the most widely used forms of...
research
03/08/2021

Nearest Neighbor Search Under Uncertainty

Nearest Neighbor Search (NNS) is a central task in knowledge representat...
research
09/12/2013

Recovery guarantees for exemplar-based clustering

For a certain class of distributions, we prove that the linear programmi...

Please sign up or login with your details

Forgot password? Click here to reset