Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies

05/24/2023
by   Yubao Tang, et al.
0

Recently, a new paradigm called Differentiable Search Index (DSI) has been proposed for document retrieval, wherein a sequence-to-sequence model is learned to directly map queries to relevant document identifiers. The key idea behind DSI is to fully parameterize traditional “index-retrieve” pipelines within a single neural model, by encoding all documents in the corpus into the model parameters. In essence, DSI needs to resolve two major questions: (1) how to assign an identifier to each document, and (2) how to learn the associations between a document and its identifier. In this work, we propose a Semantic-Enhanced DSI model (SE-DSI) motivated by Learning Strategies in the area of Cognitive Psychology. Our approach advances original DSI in two ways: (1) For the document identifier, we take inspiration from Elaboration Strategies in human learning. Specifically, we assign each document an Elaborative Description based on the query generation technique, which is more meaningful than a string of integers in the original DSI; and (2) For the associations between a document and its identifier, we take inspiration from Rehearsal Strategies in human learning. Specifically, we select fine-grained semantic features from a document as Rehearsal Contents to improve document memorization. Both the offline and online experiments show improved retrieval performance over prevailing baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Web search provides a promising way for people to obtain information and...
research
04/09/2023

Learning to Tokenize for Generative Retrieval

Conventional document retrieval techniques are mainly based on the index...
research
06/06/2022

A Neural Corpus Indexer for Document Retrieval

Current state-of-the-art document retrieval solutions mainly follow an i...
research
06/21/2022

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

The Differentiable Search Index (DSI) is a new, emerging paradigm for in...
research
08/19/2022

Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer

Document retrieval has been extensively studied within the index-retriev...
research
05/19/2023

How Does Generative Retrieval Scale to Millions of Passages?

Popularized by the Differentiable Search Index, the emerging paradigm of...
research
07/11/2022

Topic-Grained Text Representation-based Model for Document Retrieval

Document retrieval enables users to find their required documents accura...

Please sign up or login with your details

Forgot password? Click here to reset