Lexically-Accelerated Dense Retrieval

07/31/2023
by   Hrishikesh Kulkarni, et al.
0

Retrieval approaches that score documents based on learned dense vectors (i.e., dense retrieval) rather than lexical signals (i.e., conventional retrieval) are increasingly popular. Their ability to identify related documents that do not necessarily contain the same terms as those appearing in the user's query (thereby improving recall) is one of their key advantages. However, to actually achieve these gains, dense retrieval approaches typically require an exhaustive search over the document collection, making them considerably more expensive at query-time than conventional lexical approaches. Several techniques aim to reduce this computational overhead by approximating the results of a full dense retriever. Although these approaches reasonably approximate the top results, they suffer in terms of recall – one of the key advantages of dense retrieval. We introduce 'LADR' (Lexically-Accelerated Dense Retrieval), a simple-yet-effective approach that improves the efficiency of existing dense retrieval models without compromising on retrieval effectiveness. LADR uses lexical retrieval techniques to seed a dense retrieval exploration that uses a document proximity graph. We explore two variants of LADR: a proactive approach that expands the search space to the neighbors of all seed documents, and an adaptive approach that selectively searches the documents with the highest estimated relevance in an iterative fashion. Through extensive experiments across a variety of dense retrieval models, we find that LADR establishes a new dense retrieval effectiveness-efficiency Pareto frontier among approximate k nearest neighbor techniques. Further, we find that when tuned to take around 8ms per query in retrieval latency on our hardware, LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks.

READ FULL TEXT

page 3

page 8

research
01/05/2022

PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

Dense passage retrieval (DPR) models show great effectiveness gains in f...
research
10/22/2019

Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors

We demonstrate three approaches for adapting the open-source Lucene sear...
research
12/14/2021

Boosted Dense Retriever

We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrB...
research
03/15/2022

Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation

Dense retrieval models, which aim at retrieving the most relevant docume...
research
04/17/2023

Statute-enhanced lexical retrieval of court cases for COLIEE 2022

We discuss our experiments for COLIEE Task 1, a court case retrieval com...
research
10/22/2021

Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation

Recent advances in retrieval models based on learned sparse representati...
research
08/15/2022

Evaluating Dense Passage Retrieval using Transformers

Although representational retrieval models based on Transformers have be...

Please sign up or login with your details

Forgot password? Click here to reset