Understanding Differential Search Index for Text Retrieval

05/03/2023
by   Xiaoyang Chen, et al.
0

The Differentiable Search Index (DSI) is a novel information retrieval (IR) framework that utilizes a differentiable function to generate a sorted list of document identifiers in response to a given query. However, due to the black-box nature of the end-to-end neural architecture, it remains to be understood to what extent DSI possesses the basic indexing and retrieval abilities. To mitigate this gap, in this study, we define and examine three important abilities that a functioning IR framework should possess, namely, exclusivity, completeness, and relevance ordering. Our analytical experimentation shows that while DSI demonstrates proficiency in memorizing the unidirectional mapping from pseudo queries to document identifiers, it falls short in distinguishing relevant documents from random ones, thereby negatively impacting its retrieval effectiveness. To address this issue, we propose a multi-task distillation approach to enhance the retrieval quality without altering the structure of the model and successfully endow it with improved indexing abilities. Through experiments conducted on various datasets, we demonstrate that our proposed method outperforms previous DSI baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2022

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

The Differentiable Search Index (DSI) is a new, emerging paradigm for in...
research
04/24/2020

Learning Term Discrimination

Document indexing is a key component for efficient information retrieval...
research
08/05/2016

Bridging the Gap: Incorporating a Semantic Similarity Measure for Effectively Mapping PubMed Queries to Documents

The main approach of traditional information retrieval (IR) is to examin...
research
10/05/2018

C-DLSI: An Extended LSI Tailored for Federated Text Retrieval

As the web expands in data volume and in geographical distribution, cent...
research
08/19/2022

Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer

Document retrieval has been extensively studied within the index-retriev...
research
04/20/2021

An Analysis of Indexing and Querying Strategies on a Technologically Assisted Review Task

This paper presents a preliminary experimentation study using the CLEF 2...
research
01/09/2023

Doc2Query–: When Less is More

Doc2Query – the process of expanding the content of a document before in...

Please sign up or login with your details

Forgot password? Click here to reset