Bandit-PAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits

06/11/2020
by   Mo Tiwari, et al.
8

Clustering is a ubiquitous task in data science. Compared to the commonly used k-means clustering algorithm, k-medoids clustering algorithms require the cluster centers to be actual data points and support arbitrary distance metrics, allowing for greater interpretability and the clustering of structured objects. Current state-of-the-art k-medoids clustering algorithms, such as Partitioning Around Medoids (PAM), are iterative and are quadratic in the dataset size n for each iteration, being prohibitively expensive for large datasets. We propose Bandit-PAM, a randomized algorithm inspired by techniques from multi-armed bandits, that significantly improves the computational efficiency of PAM. We theoretically prove that Bandit-PAM reduces the complexity of each PAM iteration from O(n^2) to O(n log n) and returns the same results with high probability, under assumptions on the data that often hold in practice. We empirically validate our results on several large-scale real-world datasets, including a coding exercise submissions dataset from Code.org, the 10x Genomics 68k PBMC single-cell RNA sequencing dataset, and the MNIST handwritten digits dataset. We observe that Bandit-PAM returns the same results as PAM while performing up to 200x fewer distance computations. The improvements demonstrated by Bandit-PAM enable k-medoids clustering on a wide range of applications, including identifying cell types in large-scale single-cell data and providing scalable feedback for students learning computer science online. We also release Python and C++ implementations of our algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2017

Medoids in almost linear time via multi-armed bandits

Computing the medoid of a large number of points in high-dimensional spa...
research
05/02/2016

Graph Clustering Bandits for Recommendation

We investigate an efficient context-dependent clustering technique for r...
research
09/16/2020

Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback

Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed ...
research
10/04/2022

ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits

In this work, we propose a multi-armed bandit based framework for identi...
research
10/12/2015

Context-Aware Bandits

We propose an efficient Context-Aware clustering of Bandits (CAB) algori...
research
10/11/2019

Nonparametric Bayesian multi-armed bandits for single cell experiment design

The problem of maximizing cell type discovery under budget constraints i...
research
08/06/2016

On Context-Dependent Clustering of Bandits

We investigate a novel cluster-of-bandit algorithm CAB for collaborative...

Please sign up or login with your details

Forgot password? Click here to reset