Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

10/12/2021
by   Jingtao Zhan, et al.
0

Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by the large memory cost of storing dense vectors and the time-consuming nearest neighbor search (NNS) in vector space. Therefore, we present RepCONC, a novel retrieval model that learns discrete Representations via CONstrained Clustering. RepCONC jointly trains dual-encoders and the Product Quantization (PQ) method to learn discrete document representations and enables fast approximate NNS with compact indexes. It models quantization as a constrained clustering process, which requires the document embeddings to be uniformly clustered around the quantization centroids and supports end-to-end optimization of the quantization method and dual-encoders. We theoretically demonstrate the importance of the uniform clustering constraint in RepCONC and derive an efficient approximate solution for constrained clustering by reducing it to an instance of the optimal transport problem. Besides constrained clustering, RepCONC further adopts a vector-based inverted file system (IVF) to support highly efficient vector search on CPUs. Extensive experiments on two popular ad-hoc retrieval benchmarks show that RepCONC achieves better ranking effectiveness than competitive vector quantization baselines under different compression ratio settings. It also substantially outperforms a wide range of existing retrieval models in terms of retrieval effectiveness, memory efficiency, and time efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2021

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance

Recently, Information Retrieval community has witnessed fast-paced advan...
research
04/24/2023

Constructing Tree-based Index for Efficient and Effective Dense Retrieval

Recent studies have shown that Dense Retrieval (DR) techniques can signi...
research
05/08/2021

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Recently, the retrieval models based on dense representations have been ...
research
04/27/2023

Multivariate Representation Learning for Information Retrieval

Dense retrieval models use bi-encoder network architectures for learning...
research
12/14/2021

Boosted Dense Retriever

We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrB...
research
11/27/2021

Interpreting Dense Retrieval as Mixture of Topics

Dense Retrieval (DR) reaches state-of-the-art results in first-stage ret...
research
04/12/2020

Minimizing FLOPs to Learn Efficient Sparse Representations

Deep representation learning has become one of the most widely adopted a...

Please sign up or login with your details

Forgot password? Click here to reset