Adaptive Multi-Teacher Multi-level Knowledge Distillation

03/06/2021
by   Yuang Liu, et al.
18

Knowledge distillation (KD) is an effective learning paradigm for improving the performance of lightweight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only a single teacher in their distillation learning methods, neglecting the potential that a student can learn from multiple teachers simultaneously, or simply treat each teacher to be equally important, unable to reveal the different importance of teachers for specific examples. To bridge this gap, we propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework (AMTML-KD), which consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights which are leveraged for acquiring integrated soft-targets (high-level knowledge) and (ii) enabling the intermediate-level hints (intermediate-level knowledge) to be gathered from multiple teachers by the proposed multi-group hint strategy. As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD. Extensive results on publicly available datasets demonstrate the proposed learning framework ensures student to achieve improved performance than strong competitors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2022

SKDBERT: Compressing BERT via Stochastic Knowledge Distillation

In this paper, we propose Stochastic Knowledge Distillation (SKD) to obt...
research
05/24/2022

CDFKD-MFS: Collaborative Data-free Knowledge Distillation via Multi-level Feature Sharing

Recently, the compression and deployment of powerful deep neural network...
research
10/28/2022

Teacher-Student Architecture for Knowledge Learning: A Survey

Although Deep Neural Networks (DNNs) have shown a strong capacity to sol...
research
09/20/2019

Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation

It remains very challenging to build a pedestrian detection system for r...
research
05/13/2023

AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference

Knowledge distillation is of key importance to launching multilingual pr...
research
11/21/2022

Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text

Self-supervised representation learning has proved to be a valuable comp...
research
08/07/2023

Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation

Temporal Sentence Grounding in Videos (TSGV) aims to detect the event ti...

Please sign up or login with your details

Forgot password? Click here to reset