Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

06/21/2022
by   Shengyao Zhuang, et al.
1

The Differentiable Search Index (DSI) is a new, emerging paradigm for information retrieval. Unlike traditional retrieval architectures where index and retrieval are two different and separate components, DSI uses a single transformer model to perform both indexing and retrieval. In this paper, we identify and tackle an important issue of current DSI models: the data distribution mismatch that occurs between the DSI indexing and retrieval processes. Specifically, we argue that, at indexing, current DSI methods learn to build connections between long document texts and their identifies, but then at retrieval, short query texts are provided to DSI models to perform the retrieval of the document identifiers. This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages. To address this fundamental problem of current DSI models we propose a simple yet effective indexing framework for DSI called DSI-QG. In DSI-QG, documents are represented by a number of relevant queries generated by a query generation model at indexing time. This allows DSI models to connect a document identifier to a set of query texts when indexing, hence mitigating data distribution mismatches present between the indexing and the retrieval phases. Empirical results on popular mono-lingual and cross-lingual passage retrieval benchmark datasets show that DSI-QG significantly outperforms the original DSI model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2023

Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval

Effective cross-lingual dense retrieval methods that rely on multilingua...
research
05/03/2023

Understanding Differential Search Index for Text Retrieval

The Differentiable Search Index (DSI) is a novel information retrieval (...
research
04/20/2021

An Analysis of Indexing and Querying Strategies on a Technologically Assisted Review Task

This paper presents a preliminary experimentation study using the CLEF 2...
research
01/09/2023

Doc2Query–: When Less is More

Doc2Query – the process of expanding the content of a document before in...
research
06/20/2023

Generative Retrieval as Dense Retrieval

Generative retrieval is a promising new neural retrieval paradigm that a...
research
02/28/2021

An Efficient Indexing and Searching Technique for Information Retrieval for Urdu Language

Indexing techniques are used to improve retrieval of data in response to...
research
05/24/2023

Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies

Recently, a new paradigm called Differentiable Search Index (DSI) has be...

Please sign up or login with your details

Forgot password? Click here to reset