Adam: Dense Retrieval Distillation with Adaptive Dark Examples

12/20/2022
by   Chang Liu, et al.
0

To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a query is paired with a positive passage and a batch of negatives. However, through empirical observation, we find that even the hard negatives from advanced methods are still too trivial for the teacher to distinguish, preventing the teacher from transferring abundant dark knowledge to the student through its soft label. To alleviate this issue, we propose ADAM, a knowledge distillation framework that can better transfer the dark knowledge held in the teacher with Adaptive Dark exAMples. Different from previous works that only rely on one positive and hard negatives as candidate passages, we create dark examples that all have moderate relevance to the query through mixing-up and masking in discrete space. Furthermore, as the quality of knowledge held in different training instances varies as measured by the teacher's confidence score, we propose a self-paced distillation strategy that adaptively concentrates on a subset of high-quality instances to conduct our dark-example-based knowledge distillation to help the student learn better. We conduct experiments on two widely-used benchmarks and verify the effectiveness of our method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2022

Spot-adaptive Knowledge Distillation

Knowledge distillation (KD) has become a well established paradigm for c...
research
06/17/2022

Revisiting Self-Distillation

Knowledge distillation is the procedure of transferring "knowledge" from...
research
09/30/2020

Efficient Kernel Transfer in Knowledge Distillation

Knowledge distillation is an effective way for model compression in deep...
research
01/27/2023

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

Large neural models (such as Transformers) achieve state-of-the-art perf...
research
05/06/2022

Collective Relevance Labeling for Passage Retrieval

Deep learning for Information Retrieval (IR) requires a large amount of ...
research
05/18/2022

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Neural retrievers based on pre-trained language models (PLMs), such as d...
research
07/20/2020

Interpretable Foreground Object Search As Knowledge Distillation

This paper proposes a knowledge distillation method for foreground objec...

Please sign up or login with your details

Forgot password? Click here to reset