Term-Sets Can Be Strong Document Identifiers For Auto-Regressive Search Engines

by   Peitian Zhang, et al.

Auto-regressive search engines emerge as a promising paradigm for next-gen information retrieval systems. These methods work with Seq2Seq models, where each query can be directly mapped to the identifier of its relevant document. As such, they are praised for merits like being end-to-end differentiable. However, auto-regressive search engines also confront challenges in retrieval quality, given the requirement for the exact generation of the document identifier. That's to say, the targeted document will be missed from the retrieval result if a false prediction about its identifier is made in any step of the generation process. In this work, we propose a novel framework, namely AutoTSG (Auto-regressive Search Engine with Term-Set Generation), which is featured by 1) the unordered term-based document identifier and 2) the set-oriented generation pipeline. With AutoTSG, any permutation of the term-set identifier will lead to the retrieval of the corresponding document, thus largely relaxing the requirement of exact generation. Besides, the Seq2Seq model is enabled to flexibly explore the optimal permutation of the document identifier for the presented query, which may further contribute to the retrieval quality. AutoTSG is empirically evaluated with Natural Questions and MS MARCO, where notable improvements can be achieved against the existing auto-regressive search engines.


page 1

page 2

page 3

page 4


Learning to Tokenize for Generative Retrieval

Conventional document retrieval techniques are mainly based on the index...

Design Challenges for a Multi-Perspective Search Engine

Many users turn to document retrieval systems (e.g. search engines) to s...

Large Language Models are Built-in Autoregressive Search Engines

Document retrieval is a key stage of standard Web search engines. Existi...

An Uncertainty Management Calculus for Ordering Searches in Distributed Dynamic Databases

MINDS is a distributed system of cooperating query engines that customiz...

Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets

With the rapid advance of the Internet, search engines (e.g., Google, Bi...

Medical Information Retrieval and Interpretation: A Question-Answer based Interaction Model

The Internet has become a very powerful platform where diverse medical i...

Histopathology Slide Indexing and Search: Are We There Yet?

The search and retrieval of digital histopathology slides is an importan...

Please sign up or login with your details

Forgot password? Click here to reset