L^2R: Lifelong Learning for First-stage Retrieval with Backward-Compatible Representations

08/22/2023
by   Yinqiong Cai, et al.
0

First-stage retrieval is a critical task that aims to retrieve relevant document candidates from a large-scale collection. While existing retrieval models have achieved impressive performance, they are mostly studied on static data sets, ignoring that in the real-world, the data on the Web is continuously growing with potential distribution drift. Consequently, retrievers trained on static old data may not suit new-coming data well and inevitably produce sub-optimal results. In this work, we study lifelong learning for first-stage retrieval, especially focusing on the setting where the emerging documents are unlabeled since relevance annotation is expensive and may not keep up with data emergence. Under this setting, we aim to develop model updating with two goals: (1) to effectively adapt to the evolving distribution with the unlabeled new-coming data, and (2) to avoid re-inferring all embeddings of old documents to efficiently update the index each time the model is updated. We first formalize the task and then propose a novel Lifelong Learning method for the first-stage Retrieval, namely L^2R. L^2R adopts the typical memory mechanism for lifelong learning, and incorporates two crucial components: (1) selecting diverse support negatives for model training and memory updating for effective model adaptation, and (2) a ranking alignment objective to ensure the backward-compatibility of representations to save the cost of index rebuilding without hurting the model performance. For evaluation, we construct two new benchmarks from LoTTE and Multi-CPR datasets to simulate the document distribution drift in realistic retrieval scenarios. Extensive experiments show that L^2R significantly outperforms competitive lifelong learning baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2023

MixBCT: Towards Self-Adapting Backward-Compatible Training

The exponential growth of data, alongside advancements in model structur...
research
10/13/2022

Darwinian Model Upgrades: Model Evolving with Selective Compatibility

The traditional model upgrading paradigm for retrieval requires recomput...
research
03/08/2023

FastFill: Efficient Compatible Model Update

In many retrieval systems the original high dimensional data (e.g., imag...
research
05/04/2023

Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval

Image retrieval plays an important role in the Internet world. Usually, ...
research
11/16/2022

Task-aware Retrieval with Instructions

We study the problem of retrieval with instructions, where users of a re...
research
08/29/2023

Continual Learning for Generative Retrieval over Dynamic Corpora

Generative retrieval (GR) directly predicts the identifiers of relevant ...
research
08/19/2022

Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer

Document retrieval has been extensively studied within the index-retriev...

Please sign up or login with your details

Forgot password? Click here to reset