Disentangled Modeling of Domain and Relevance for Adaptable Dense Retrieval

08/11/2022
by   Jingtao Zhan, et al.
6

Recent advance in Dense Retrieval (DR) techniques has significantly improved the effectiveness of first-stage retrieval. Trained with large-scale supervised data, DR models can encode queries and documents into a low-dimensional dense space and conduct effective semantic matching. However, previous studies have shown that the effectiveness of DR models would drop by a large margin when the trained DR models are adopted in a target domain that is different from the domain of the labeled data. One of the possible reasons is that the DR model has never seen the target corpus and thus might be incapable of mitigating the difference between the training and target domains. In practice, unfortunately, training a DR model for each target domain to avoid domain shift is often a difficult task as it requires additional time, storage, and domain-specific data labeling, which are not always available. To address this problem, in this paper, we propose a novel DR framework named Disentangled Dense Retrieval (DDR) to support effective and flexible domain adaptation for DR models. DDR consists of a Relevance Estimation Module (REM) for modeling domain-invariant matching patterns and several Domain Adaption Modules (DAMs) for modeling domain-specific features of multiple target corpora. By making the REM and DAMs disentangled, DDR enables a flexible training paradigm in which REM is trained with supervision once and DAMs are trained with unsupervised data. Comprehensive experiments in different domains and languages show that DDR significantly improves ranking performance compared to strong DR baselines and substantially outperforms traditional retrieval methods in most scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

Evaluating Extrapolation Performance of Dense Retrieval

A retrieval model should not only interpolate the training data but also...
research
10/14/2021

Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations

Dense retrieval (DR) methods conduct text retrieval by first encoding te...
research
09/13/2022

HEARTS: Multi-task Fusion of Dense Retrieval and Non-autoregressive Generation for Sponsored Search

Matching user search queries with relevant keywords bid by advertisers i...
research
08/27/2021

Dealing with Typos for BERT-based Passage Retrieval and Ranking

Passage retrieval and ranking is a key task in open-domain question answ...
research
06/26/2022

Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems

Recently, several dense retrieval (DR) models have demonstrated competit...
research
05/18/2023

BERM: Training the Balanced and Extractable Representation for Matching to Improve Generalization Ability of Dense Retrieval

Dense retrieval has shown promise in the first-stage retrieval process w...
research
02/25/2022

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training

The process of model checkpoint validation refers to the evaluation of t...

Please sign up or login with your details

Forgot password? Click here to reset