PROD: Progressive Distillation for Dense Retrieval

09/27/2022
by   Zhenghao Lin, et al.
7

Knowledge distillation is an effective way to transfer knowledge from a strong teacher to an efficient student model. Ideally, we expect the better the teacher is, the better the student. However, this expectation does not always come true. It is common that a better teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student. To bridge the gap, we propose PROD, a PROgressive Distillation method, for dense retrieval. PROD consists of a teacher progressive distillation and a data progressive distillation to gradually improve the student. We conduct extensive experiments on five widely-used benchmarks, MS MARCO Passage, TREC Passage 19, TREC Document 19, MS MARCO Document and Natural Questions, where PROD achieves the state-of-the-art within the distillation methods for dense retrieval. The code and models will be released.

READ FULL TEXT
research
12/10/2022

LEAD: Liberal Feature-based Distillation for Dense Retrieval

Knowledge distillation is often used to transfer knowledge from a strong...
research
04/28/2022

Curriculum Learning for Dense Retrieval Distillation

Recent work has shown that more effective dense retrieval models can be ...
research
01/26/2019

Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks

Much of the focus in the area of knowledge distillation has been on dist...
research
01/27/2023

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

Large neural models (such as Transformers) achieve state-of-the-art perf...
research
03/09/2023

Learn More for Food Recognition via Progressive Self-Distillation

Food recognition has a wide range of applications, such as health-aware ...
research
10/16/2021

Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher

With ever growing scale of neural models, knowledge distillation (KD) at...
research
06/13/2022

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

Knowledge distillation (KD) has shown very promising capabilities in tra...

Please sign up or login with your details

Forgot password? Click here to reset