Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder

05/06/2022
by   Zhenghao Liu, et al.
0

Dense retrievers encode texts and map them in an embedding space using pre-trained language models. These embeddings are critical to keep high-dimensional for effectively training dense retrievers, but lead to a high cost of storing index and retrieval. To reduce the embedding dimensions of dense retrieval, this paper proposes a Conditional Autoencoder (ConAE) to compress the high-dimensional embeddings to maintain the same embedding distribution and better recover the ranking features. Our experiments show the effectiveness of ConAE in compressing embeddings by achieving comparable ranking performance with the raw ones, making the retrieval system more efficient. Our further analyses show that ConAE can mitigate the redundancy of the embeddings of dense retrieval with only one linear layer. All codes of this work are available at https://github.com/NEUIR/ConAE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2023

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

This paper introduces a method called Sparsified Late Interaction for Mu...
research
07/16/2021

More Robust Dense Retrieval with Contrastive Dual Learning

Dense retrieval conducts text retrieval in the embedding space and has s...
research
08/23/2021

Query Embedding Pruning for Dense Retrieval

Recent advances in dense retrieval techniques have offered the promise o...
research
07/31/2022

Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

Pre-trained transformers has declared its success in many NLP tasks. One...
research
09/17/2020

S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning

Deep Metric Learning (DML) provides a crucial tool for visual similarity...
research
05/23/2022

Domain Adaptation for Memory-Efficient Dense Retrieval

Dense retrievers encode documents into fixed dimensional embeddings. How...
research
04/15/2021

UHD-BERT: Bucketed Ultra-High Dimensional Sparse Representations for Full Ranking

Neural information retrieval (IR) models are promising mainly because th...

Please sign up or login with your details

Forgot password? Click here to reset