Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors

10/22/2019
by   Tommaso Teofili, et al.
0

We demonstrate three approaches for adapting the open-source Lucene search library to perform approximate nearest-neighbor search on arbitrary dense vectors, using similarity search on word embeddings as a case study. At its core, Lucene is built around inverted indexes of a document collection's (sparse) term-document matrix, which is incompatible with the lower-dimensional dense vectors that are common in deep learning applications. We evaluate three techniques to overcome these challenges that can all be natively integrated into Lucene: the creation of documents populated with fake words, LSH applied to lexical realizations of dense vectors, and k-d trees coupled with dimensionality reduction. Experiments show that the "fake words" approach represents the best balance between effectiveness and efficiency. These techniques are integrated into the Anserini open-source toolkit and made available to the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2023

Lexically-Accelerated Dense Retrieval

Retrieval approaches that score documents based on learned dense vectors...
research
04/24/2023

Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes

Anserini is a Lucene-based toolkit for reproducible information retrieva...
research
06/27/2012

On the Difficulty of Nearest Neighbor Search

Fast approximate nearest neighbor (NN) search in large databases is beco...
research
09/16/2023

Bridging Dense and Sparse Maximum Inner Product Search

Maximum inner product search (MIPS) over dense and sparse vectors have p...
research
12/02/2019

scikit-hubness: Hubness Reduction and Approximate Neighbor Search

This paper introduces scikit-hubness, a Python package for efficient nea...
research
07/14/2021

A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU

The Word Movers Distance (WMD) measures the semantic dissimilarity betwe...
research
09/04/2017

Neural Distributed Autoassociative Memories: A Survey

Introduction. Neural network models of autoassociative, distributed memo...

Please sign up or login with your details

Forgot password? Click here to reset