Coresets for Kernel Clustering

10/06/2021
by   Shaofeng H. -C. Jiang, et al.
0

We devise the first coreset for kernel k-Means, and use it to obtain new, more efficient, algorithms. Kernel k-Means has superior clustering capability compared to classical k-Means particularly when clusters are separable non-linearly, but it also introduces significant computational challenges. We address this computational issue by constructing a coreset, which is a reduced dataset that accurately preserves the clustering costs. Our main result is the first coreset for kernel k-Means, whose size is independent of the number of input points n, and moreover is constructed in time near-linear in n. This result immediately implies new algorithms for kernel k-Means, such as a (1+ϵ)-approximation in time near-linear in n, and a streaming algorithm using space and update time poly(k ϵ^-1log n). We validate our coreset on various datasets with different kernels. Our coreset performs consistently well, achieving small errors while using very few points. We show that our coresets can speed up kernel k-Means++ (the kernelized version of the widely used k-Means++ algorithm), and we further use this faster kernel k-Means++ for spectral clustering. In both applications, we achieve up to 1000x speedup while the error is comparable to baselines that do not use coresets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2017

Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds

Kernel k-means clustering can correctly identify and extract a far more ...
research
10/19/2022

Near-optimal Coresets for Robust Clustering

We consider robust clustering problems in ℝ^d, specifically k-clustering...
research
12/23/2020

K-Means Kernel Classifier

We combine K-means clustering with the least-squares kernel classificati...
research
10/09/2017

Distributed Kernel K-Means for Large Scale Clustering

Clustering samples according to an effective metric and/or vector space ...
research
03/27/2018

Distributed Adaptive Sampling for Kernel Matrix Approximation

Most kernel-based methods, such as kernel or Gaussian process regression...
research
03/09/2020

Nearly Optimal Risk Bounds for Kernel K-Means

In this paper, we study the statistical properties of the kernel k-means...
research
10/26/2017

Energy Clustering

Energy statistics was proposed by Székely in the 80's inspired by the Ne...

Please sign up or login with your details

Forgot password? Click here to reset