QuicK-means: Acceleration of K-means by learning a fast transform

08/23/2019
by   Luc Giffon, et al.
34

K-means -- and the celebrated Lloyd algorithm -- is more than the clustering method it was originally designed to be. It has indeed proven pivotal to help increase the speed of many machine learning and data analysis techniques such as indexing, nearest-neighbor search and prediction, data compression; its beneficial use has been shown to carry over to the acceleration of kernel machines (when using the Nyström method). Here, we propose a fast extension of K-means, dubbed QuicK-means, that rests on the idea of expressing the matrix of the K centroids as a product of sparse matrices, a feat made possible by recent results devoted to find approximations of matrices as a product of sparse factors. Using such a decomposition squashes the complexity of the matrix-vector product between the factorized K × D centroid matrix U and any vector from O(K D) to O(A A+B), with A= (K, D) and B= (K, D), where D is the dimension of the training data. This drastic computational saving has a direct impact in the assignment process of a point to a cluster, meaning that it is not only tangible at prediction time, but also at training time, provided the factorization procedure is performed during Lloyd's algorithm. We precisely show that resorting to a factorization step at each iteration does not impair the convergence of the optimization scheme and that, depending on the context, it may entail a reduction of the training time. Finally, we provide discussions and numerical simulations that show the versatility of our computationally-efficient QuicK-means algorithm.

READ FULL TEXT

page 9

page 10

page 12

page 15

research
11/21/2022

Multiresolution kernel matrix algebra

We propose a sparse arithmetic for kernel matrices, enabling efficient s...
research
02/07/2020

Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization

Kernel-based clustering algorithm can identify and capture the non-linea...
research
03/10/2015

Fast and Robust Fixed-Rank Matrix Recovery

We address the problem of efficient sparse fixed-rank (S-FR) matrix deco...
research
07/30/2021

Efficient Sparse Spherical k-Means for Document Clustering

Spherical k-Means is frequently used to cluster document collections bec...
research
05/27/2018

Fast K-Means Clustering with Anderson Acceleration

We propose a novel method to accelerate Lloyd's algorithm for K-Means cl...
research
09/12/2017

PQk-means: Billion-scale Clustering for Product-quantized Codes

Data clustering is a fundamental operation in data analysis. For handlin...
research
09/01/2016

Understanding Trainable Sparse Coding via Matrix Factorization

Sparse coding is a core building block in many data analysis and machine...

Please sign up or login with your details

Forgot password? Click here to reset