IncDSI: Incrementally Updatable Document Retrieval

07/19/2023
by   Varsha Kishore, et al.
0

Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.

READ FULL TEXT
research
05/23/2023

DAPR: A Benchmark on Document-Aware Passage Retrieval

Recent neural retrieval mainly focuses on ranking short texts and is cha...
research
10/07/2022

Longtonotes: OntoNotes with Longer Coreference Chains

Ontonotes has served as the most important benchmark for coreference res...
research
06/28/2021

Keyphrase Generation for Scientific Document Retrieval

Sequence-to-sequence models have lead to significant progress in keyphra...
research
04/28/2023

CED: Catalog Extraction from Documents

Sentence-by-sentence information extraction from long documents is an ex...
research
12/19/2022

DSI++: Updating Transformer Memory with New Documents

Differentiable Search Indices (DSIs) encode a corpus of documents in the...
research
08/19/2022

Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer

Document retrieval has been extensively studied within the index-retriev...
research
11/09/2022

DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop

Business documents come in a variety of structures, formats and informat...

Please sign up or login with your details

Forgot password? Click here to reset