Semi-Supervised Active Clustering with Weak Oracles

09/11/2017
by   Taewan Kim, et al.
0

Semi-supervised active clustering (SSAC) utilizes the knowledge of a domain expert to cluster data points by interactively making pairwise "same-cluster" queries. However, it is impractical to ask human oracles to answer every pairwise query. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose algorithms to efficiently handle uncertainties. Different types of model assumptions are analyzed to cover realistic scenarios of oracle abstraction. In the first model, random-weak oracle, an oracle randomly abstains with a certain probability. We also proposed two distance-weak oracle models which simulate the case of getting confused based on the distance between two points in a pairwise query. For each weak oracle model, we show that a small query complexity is adequate for the effective k means clustering with high probability. Sufficient conditions for the guarantee include a γ-margin property of the data, and an existence of a point close to each cluster center. Furthermore, we provide a sample complexity with a reduced effect of the cluster's margin and only a logarithmic dependency on the data dimension. Our results allow significantly less number of same-cluster queries if the margin of the clusters is tight, i.e. γ≈ 1. Experimental results on synthetic data show the effective performance of our approach in overcoming uncertainties.

READ FULL TEXT

page 17

page 21

research
11/20/2017

Relaxed Oracles for Semi-Supervised Clustering

Pairwise "same-cluster" queries are one of the most widely used forms of...
research
03/02/2018

Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering

We study k-means clustering in a semi-supervised setting. Given an oracl...
research
05/12/2021

How to Design Robust Algorithms using Noisy Comparison Oracle

Metric based comparison operations such as finding maximum, nearest and ...
research
06/08/2016

Clustering with Same-Cluster Queries

We propose a framework for Semi-Supervised Active Clustering framework (...
research
06/15/2018

Query K-means Clustering and the Double Dixie Cup Problem

We consider the problem of approximate K-means clustering with outliers ...
research
03/29/2018

COBRAS: Fast, Iterative, Active Clustering with Pairwise Constraints

Constraint-based clustering algorithms exploit background knowledge to c...
research
06/09/2021

On Margin-Based Cluster Recovery with Oracle Queries

We study an active cluster recovery problem where, given a set of n poin...

Please sign up or login with your details

Forgot password? Click here to reset