Efficient Clustering with Limited Distance Information

08/09/2014
by   Konstantin Voevodski, et al.
0

Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s 2 S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. We use our algorithm to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire dataset. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2019

Fully Dynamic k-Center Clustering in Doubling Metrics

In the k-center clustering problem, we are given a set of n points in a ...
research
06/08/2020

Exact Recovery of Mangled Clusters with Same-Cluster Queries

We study the problem of recovering distorted clusters in the semi-superv...
research
11/07/2022

Metricizing the Euclidean Space towards Desired Distance Relations in Point Clouds

Given a set of points in the Euclidean space ℝ^ℓ with ℓ>1, the pairwise ...
research
10/26/2020

Query Complexity of k-NN based Mode Estimation

Motivated by the mode estimation problem of an unknown multivariate prob...
research
03/16/2021

On Undecided LP, Clustering and Active Learning

We study colored coverage and clustering problems. Here, we are given a ...
research
07/31/2017

Temporal Hierarchical Clustering

We study hierarchical clusterings of metric spaces that change over time...
research
10/29/2017

If it ain't broke, don't fix it: Sparse metric repair

Many modern data-intensive computational problems either require, or ben...

Please sign up or login with your details

Forgot password? Click here to reset