Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

12/13/2022
by   Minghan Li, et al.
0

Although neural information retrieval has witnessed great improvements, recent works showed that the generalization ability of dense retrieval models on target domains with different distributions is limited, which contrasts with the results obtained with interaction-based models. To address this issue, researchers have resorted to adversarial learning and query generation approaches; both approaches nevertheless resulted in limited improvements. In this paper, we propose to use a self-supervision approach in which pseudo-relevance labels are automatically generated on the target domain. To do so, we first use the standard BM25 model on the target domain to obtain a first ranking of documents, and then use the interaction-based model T53B to re-rank top documents. We further combine this approach with knowledge distillation relying on an interaction-based teacher model trained on the source domain. Our experiments reveal that pseudo-relevance labeling using T53B and the MiniLM teacher performs on average better than other approaches and helps improve the state-of-the-art query generation approach GPL when it is fine-tuned on the pseudo-relevance labeled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2021

Cross-Domain Object Detection via Adaptive Self-Training

We tackle the problem of domain adaptation in object detection, where th...
research
07/06/2023

Dense Retrieval Adaptation using Target Domain Description

In information retrieval (IR), domain adaptation is the process of adapt...
research
08/05/2023

Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

Domain transfer is a prevalent challenge in modern neural Information Re...
research
05/18/2023

BERM: Training the Balanced and Extractable Representation for Matching to Improve Generalization Ability of Dense Retrieval

Dense retrieval has shown promise in the first-stage retrieval process w...
research
05/19/2023

Exploring the Viability of Synthetic Query Generation for Relevance Prediction

Query-document relevance prediction is a critical problem in Information...
research
05/06/2022

Collective Relevance Labeling for Passage Retrieval

Deep learning for Information Retrieval (IR) requires a large amount of ...
research
04/01/2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Vector quantization (VQ) based ANN indexes, such as Inverted File System...

Please sign up or login with your details

Forgot password? Click here to reset