Memory vectors for similarity search in high-dimensional spaces

12/10/2014
by   Ahmet Iscen, et al.
0

We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory. This architecture is composed of several memory units, each of which summarizes a fraction of the database by a single representative vector. The potential similarity of the query to one of the vectors stored in the memory unit is gauged by a simple correlation with the memory unit's representative vector. This representative optimizes the test of the following hypothesis: the query is independent from any vector in the memory unit vs. the query is a simple perturbation of one of the stored vectors. Compared to exhaustive search, our approach finds the most similar database vectors significantly faster without a noticeable reduction in search quality. Interestingly, the reduction of complexity is provably better in high-dimensional spaces. We empirically demonstrate its practical interest in a large-scale image search scenario with off-the-shelf state-of-the-art descriptors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2019

qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces

Similarity search queries in high-dimensional spaces are an important ty...
research
09/04/2017

Neural Distributed Autoassociative Memories: A Survey

Introduction. Neural network models of autoassociative, distributed memo...
research
02/23/2018

High-Dimensional Vector Semantics

In this paper we explore the "vector semantics" problem from the perspec...
research
02/28/2017

Billion-scale similarity search with GPUs

Similarity search finds application in specialized database systems hand...
research
08/30/2020

SOLAR: Sparse Orthogonal Learned and Random Embeddings

Dense embedding models are commonly deployed in commercial search engine...
research
04/19/2016

Using Apache Lucene to Search Vector of Locally Aggregated Descriptors

Surrogate Text Representation (STR) is a profitable solution to efficien...
research
04/07/2023

Similarity search in the blink of an eye with compressed indices

Nowadays, data is represented by vectors. Retrieving those vectors, amon...

Please sign up or login with your details

Forgot password? Click here to reset