Clustering with Same-Cluster Queries

06/08/2016
by   Hassan Ashtiani, et al.
0

We propose a framework for Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to interact with a domain expert, asking whether two given instances belong to the same cluster or not. We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We show that there is a trade off between computational complexity and query complexity; We prove that for the case of k-means clustering (i.e., when the expert conforms to a solution of k-means), having access to relatively few such queries allows efficient solutions to otherwise NP hard problems. In particular, we provide a probabilistic polynomial-time (BPP) algorithm for clustering in this setting that asks O(k^2 k + k n) same-cluster queries and runs with time complexity O(kn n) (where k is the number of clusters and n is the number of instances). The algorithm succeeds with high probability for data satisfying margin conditions under which, without queries, we show that the problem is NP hard. We also prove a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

Fuzzy Clustering with Similarity Queries

The fuzzy or soft k-means objective is a popular generalization of the w...
research
12/19/2017

Approximate Correlation Clustering Using Same-Cluster Queries

Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for c...
research
09/11/2017

Semi-Supervised Active Clustering with Weak Oracles

Semi-supervised active clustering (SSAC) utilizes the knowledge of a dom...
research
03/29/2018

COBRAS: Fast, Iterative, Active Clustering with Pairwise Constraints

Constraint-based clustering algorithms exploit background knowledge to c...
research
02/28/2023

Learning Hidden Markov Models Using Conditional Samples

This paper is concerned with the computational complexity of learning th...
research
05/28/2019

Correlation Clustering with Adaptive Similarity Queries

We investigate learning algorithms that use similarity queries to approx...
research
03/24/2020

Approximate Aggregate Queries Under Additive Inequalities

We consider the problem of evaluating certain types of functional aggreg...

Please sign up or login with your details

Forgot password? Click here to reset