Balancing clusters to reduce response time variability in large scale image search

09/21/2010
by   Romain Tavenard, et al.
0

Many algorithms for approximate nearest neighbor search in high-dimensional spaces partition the data into clusters. At query time, in order to avoid exhaustive search, an index selects the few (or a single) clusters nearest to the query point. Clusters are often produced by the well-known k-means approach since it has several desirable properties. On the downside, it tends to produce clusters having quite different cardinalities. Imbalanced clusters negatively impact both the variance and the expectation of query response times. This paper proposes to modify k-means centroids to produce clusters with more comparable sizes without sacrificing the desirable properties. Experiments with a large scale collection of image descriptors show that our algorithm significantly reduces the variance of response times without seriously impacting the search quality.

READ FULL TEXT
research
07/09/2018

Learning to Index for Nearest Neighbor Search

In this study, we present a novel ranking model based on learning the ne...
research
02/10/2021

Leveraging Reinforcement Learning for evaluating Robustness of KNN Search Algorithms

The problem of finding K-nearest neighbors in the given dataset for a gi...
research
12/08/2017

Exploiting Modern Hardware for High-Dimensional Nearest Neighbor Search

Many multimedia information retrieval or machine learning problems requi...
research
05/04/2017

Fast k-means based on KNN Graph

In the era of big data, k-means clustering has been widely adopted as a ...
research
11/05/2021

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

The in-memory algorithms for approximate nearest neighbor search (ANNS) ...
research
02/02/2010

Feature Level Clustering of Large Biometric Database

This paper proposes an efficient technique for partitioning large biomet...
research
08/05/2023

DeDrift: Robust Similarity Search under Content Drift

The statistical distribution of content uploaded and searched on media s...

Please sign up or login with your details

Forgot password? Click here to reset