A Dense Representation Framework for Lexical and Semantic Matching

06/20/2022
by   Sheng-Chieh Lin, et al.
0

Lexical and semantic matching capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust than either alone. Prior work performs hybrid retrieval by conducting lexical and semantic text matching using different systems (e.g., Lucene and Faiss, respectively) and then fusing their model outputs. In contrast, our work integrates lexical representations with dense semantic representations by densifying high-dimensional lexical representations into what we call low-dimensional dense lexical representations (DLRs). Our experiments show that DLRs can effectively approximate the original lexical representations, preserving effectiveness while improving query latency. Furthermore, we can combine dense lexical and semantic representations to generate dense hybrid representations (DHRs) that are more flexible and yield faster retrieval compared to existing hybrid techniques. Finally, we explore jointly training lexical and semantic representations in a single model and empirically show that the resulting DHRs are able to combine the advantages of each individual component. Our best DHR model is competitive with state-of-the-art single-vector and multi-vector dense retrievers in both in-domain and zero-shot evaluation settings. Furthermore, our model is both faster and requires smaller indexes, making our dense representation framework an attractive approach to text retrieval. Our code is available at https://github.com/castorini/dhr.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2021

Densifying Sparse Representations for Passage Retrieval by Representational Slicing

Learned sparse and dense representations capture different successful ap...
research
12/17/2021

Sparsifying Sparse Representations for Passage Retrieval by Top-k Masking

Sparse lexical representation learning has demonstrated much progress in...
research
10/21/2022

An Analysis of Fusion Functions for Hybrid Retrieval

We study hybrid search in text retrieval where lexical and semantic sear...
research
03/11/2022

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

In this paper, we propose LaPraDoR, a pretrained dual-tower dense retrie...
research
05/18/2018

Robust Handling of Polysemy via Sparse Representations

Words are polysemous and multi-faceted, with many shades of meanings. We...
research
12/14/2021

Boosted Dense Retriever

We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrB...
research
05/16/2023

Hybrid and Collaborative Passage Reranking

In passage retrieval system, the initial passage retrieval results may b...

Please sign up or login with your details

Forgot password? Click here to reset