LEAD: Liberal Feature-based Distillation for Dense Retrieval

12/10/2022
by   Hao Sun, et al.
0

Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model. Traditional knowledge distillation methods include response-based methods and feature-based methods. Response-based methods are used the most widely but suffer from lower upper limit of model performance, while feature-based methods have constraints on the vocabularies and tokenizers. In this paper, we propose a tokenizer-free method liberal feature-based distillation (LEAD). LEAD aligns the distribution between teacher model and student model, which is effective, extendable, portable and has no requirements on vocabularies, tokenizer, or model architecture. Extensive experiments show the effectiveness of LEAD on several widely-used benchmarks, including MS MARCO Passage, TREC Passage 19, TREC Passage 20, MS MARCO Document, TREC Document 19 and TREC Document 20.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2022

PROD: Progressive Distillation for Dense Retrieval

Knowledge distillation is an effective way to transfer knowledge from a ...
research
11/11/2019

Knowledge Distillation in Document Retrieval

Complex deep learning models now achieve state of the art performance fo...
research
06/14/2019

Effectiveness of Distillation Attack and Countermeasure on Neural Network Watermarking

The rise of machine learning as a service and model sharing platforms ha...
research
08/18/2023

Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-free Continual Learning

In this work, we investigate exemplar-free class incremental learning (C...
research
04/28/2022

Curriculum Learning for Dense Retrieval Distillation

Recent work has shown that more effective dense retrieval models can be ...
research
12/14/2022

Hybrid Paradigm-based Brain-Computer Interface for Robotic Arm Control

Brain-computer interface (BCI) uses brain signals to communicate with ex...
research
11/18/2019

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Knowledge distillation (KD) is widely used for training a compact model ...

Please sign up or login with your details

Forgot password? Click here to reset