Efficient Active Algorithms for Hierarchical Clustering

06/18/2012
by   Akshay Krishnamurthy, et al.
0

Advances in sensing technologies and the growth of the internet have resulted in an explosion in the size of modern datasets, while storage and processing power continue to lag behind. This motivates the need for algorithms that are efficient, both in terms of the number of measurements needed and running time. To combat the challenges associated with large datasets, we propose a general framework for active hierarchical clustering that repeatedly runs an off-the-shelf clustering algorithm on small subsets of the data and comes with guarantees on performance, measurement complexity and runtime complexity. We instantiate this framework with a simple spectral clustering algorithm and provide concrete results on its performance, showing that, under some assumptions, this algorithm recovers all clusters of size ?(log n) using O(n log^2 n) similarities and runs in O(n log^3 n) time for a dataset of n objects. Through extensive experimentation we also demonstrate that this framework is practically alluring.

READ FULL TEXT

page 3

page 8

research
06/01/2016

Short Communication on QUIST: A Quick Clustering Algorithm

In this short communication we introduce the quick clustering algorithm ...
research
08/18/2023

Do you know what q-means?

Clustering is one of the most important tools for analysis of large data...
research
06/10/2021

Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time

We study the widely used hierarchical agglomerative clustering (HAC) alg...
research
10/30/2018

Cluster Size Management in Multi-Stage Agglomerative Hierarchical Clustering of Acoustic Speech Segments

Agglomerative hierarchical clustering (AHC) requires only the similarity...
research
06/11/2020

Faster DBSCAN via subsampled similarity queries

DBSCAN is a popular density-based clustering algorithm. It computes the ...
research
04/13/2018

Adversarial Clustering: A Grid Based Clustering Algorithm Against Active Adversaries

Nowadays more and more data are gathered for detecting and preventing cy...
research
10/28/2015

Fast Landmark Subspace Clustering

Kernel methods obtain superb performance in terms of accuracy for variou...

Please sign up or login with your details

Forgot password? Click here to reset