PQk-means: Billion-scale Clustering for Product-quantized Codes

09/12/2017
by   Yusuke Matsui, et al.
0

Data clustering is a fundamental operation in data analysis. For handling large-scale data, the standard k-means clustering method is not only slow, but also memory-inefficient. We propose an efficient clustering method for billion-scale feature vectors, called PQk-means. By first compressing input vectors into short product-quantized (PQ) codes, PQk-means achieves fast and memory-efficient clustering, even for high-dimensional vectors. Similar to k-means, PQk-means repeats the assignment and update steps, both of which can be performed in the PQ-code domain. Experimental results show that even short-length (32 bit) PQ-codes can produce competitive results compared with k-means. This result is of practical importance for clustering in memory-restricted environments. Using the proposed PQk-means scheme, the clustering of one billion 128D SIFT features with K = 10^5 is achieved within 14 hours, using just 32 GB of memory consumption on a single computer.

READ FULL TEXT

page 1

page 8

research
05/30/2016

k2-means for fast and accurate large scale clustering

We propose k^2-means, a new clustering method which efficiently copes wi...
research
09/16/2020

Too Much Information Kills Information: A Clustering Perspective

Clustering is one of the most fundamental tools in the artificial intell...
research
02/21/2020

Inverted-File k-Means Clustering: Performance Analysis

This paper presents an inverted-file k-means clustering algorithm (IVF) ...
research
08/17/2023

Approximating Clustering for Memory Management and request processing

Clustering is a crucial tool for analyzing data in virtually every scien...
research
08/26/2016

A Randomized Approach to Efficient Kernel Clustering

Kernel-based K-means clustering has gained popularity due to its simplic...
research
03/30/2021

Structured Inverted-File k-Means Clustering for High-Dimensional Sparse Data

This paper presents an architecture-friendly k-means clustering algorith...
research
08/23/2019

QuicK-means: Acceleration of K-means by learning a fast transform

K-means -- and the celebrated Lloyd algorithm -- is more than the cluste...

Please sign up or login with your details

Forgot password? Click here to reset