Same-Cluster Querying for Overlapping Clusters

10/28/2019
by   Wasim Huleihel, et al.
0

Overlapping clusters are common in models of many practical data-segmentation applications. Suppose we are given n elements to be clustered into k possibly overlapping clusters, and an oracle that can interactively answer queries of the form "do elements u and v belong to the same cluster?" The goal is to recover the clusters with minimum number of such queries. This problem has been of recent interest for the case of disjoint clusters. In this paper, we look at the more practical scenario of overlapping clusters, and provide upper bounds (with algorithms) on the sufficient number of queries. We provide algorithmic results under both arbitrary (worst-case) and statistical modeling assumptions. Our algorithms are parameter free, efficient, and work in the presence of random noise. We also derive information-theoretic lower bounds on the number of queries needed, proving that our algorithms are order optimal. Finally, we test our algorithms over both synthetic and real-world data, showing their practicality and effectiveness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2017

Query Complexity of Clustering with Side Information

Suppose, we are given a set of n elements to be clustered into k (unknow...
research
08/17/2021

Learning to Cluster via Same-Cluster Queries

We study the problem of learning to cluster data points using an oracle ...
research
05/28/2019

Correlation Clustering with Adaptive Similarity Queries

We investigate learning algorithms that use similarity queries to approx...
research
07/14/2023

Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations

Finding (bi-)clusters in bipartite graphs is a popular data analysis app...
research
06/04/2019

A numerical measure of the instability of Mapper-type algorithms

Mapper is an unsupervised machine learning algorithm generalising the no...
research
11/15/2019

Penalized k-means algorithms for finding the correct number of clusters in a dataset

In many applications we want to find the number of clusters in a dataset...
research
03/31/2019

Semisupervised Clustering by Queries and Locally Encodable Source Coding

Source coding is the canonical problem of data compression in informatio...

Please sign up or login with your details

Forgot password? Click here to reset