The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes

12/28/2020
by   Nils Reimers, et al.
0

Information Retrieval using dense low-dimensional representations recently became popular and showed out-performance to traditional sparse-representations like BM25. However, no previous work investigated how dense representations perform with large index sizes. We show theoretically and empirically that the performance for dense representations decreases quicker than sparse representations for increasing index sizes. In extreme cases, this can even lead to a tipping point where at a certain index size sparse representations outperform dense representations. We show that this behavior is tightly connected to the number of dimensions of the representations: The lower the dimension, the higher the chance for false positives, i.e. returning irrelevant documents.

READ FULL TEXT
research
12/09/2021

Densifying Sparse Representations for Passage Retrieval by Representational Slicing

Learned sparse and dense representations capture different successful ap...
research
05/01/2020

Sparse, Dense, and Attentional Representations for Text Retrieval

Dual encoder architectures perform retrieval by encoding documents and q...
research
04/15/2021

UHD-BERT: Bucketed Ultra-High Dimensional Sparse Representations for Full Ranking

Neural information retrieval (IR) models are promising mainly because th...
research
04/12/2020

Minimizing FLOPs to Learn Efficient Sparse Representations

Deep representation learning has become one of the most widely adopted a...
research
09/22/2021

Predicting Efficiency/Effectiveness Trade-offs for Dense vs. Sparse Retrieval Strategy Selection

Over the last few years, contextualized pre-trained transformer models s...
research
06/17/2017

Accelerating Innovation Through Analogy Mining

The availability of large idea repositories (e.g., the U.S. patent datab...
research
08/09/2022

Early Stage Sparse Retrieval with Entity Linking

Despite the advantages of their low-resource settings, traditional spars...

Please sign up or login with your details

Forgot password? Click here to reset