GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

12/14/2021
by   Kexin Wang, et al.
0

Dense retrieval approaches can overcome the lexical gap and lead to significantly improved search results. However, they require large amounts of training data which is not available for most domains. As shown in previous work (Thakur et al., 2021b), the performance of dense retrievers severely degrades under a domain shift. This limits the usage of dense retrieval approaches to only a few domains with large training datasets. In this paper, we propose the novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder. On six representative domain-specialized datasets, we find the proposed GPL can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 8.9 points nDCG@10. GPL requires less (unlabeled) data from the target domain and is more robust in its training than previous methods. We further investigate the role of six recent pre-training methods in the scenario of domain adaptation for retrieval tasks, where only three could yield improved results. The best approach, TSDAE (Wang et al., 2021) can be combined with GPL, yielding another average improvement of 1.0 points nDCG@10 across the six tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2021

Source-Free Domain Adaptive Fundus Image Segmentation with Denoised Pseudo-Labeling

Domain adaptation typically requires to access source domain data to uti...
research
05/23/2022

Domain Adaptation for Memory-Efficient Dense Retrieval

Dense retrievers encode documents into fixed dimensional embeddings. How...
research
11/18/2019

Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling

Unsupervised domain adaptation aims to address the problem of classifyin...
research
03/10/2023

Generative Model Based Noise Robust Training for Unsupervised Domain Adaptation

Target domain pseudo-labelling has shown effectiveness in unsupervised d...
research
08/05/2023

Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

Domain transfer is a prevalent challenge in modern neural Information Re...
research
01/03/2023

Generative appearance replay for continual unsupervised domain adaptation

Deep learning models can achieve high accuracy when trained on large amo...
research
06/28/2023

Confidence-Calibrated Ensemble Dense Phrase Retrieval

In this paper, we consider the extent to which the transformer-based Den...

Please sign up or login with your details

Forgot password? Click here to reset