Confidence-Aware Multi-Teacher Knowledge Distillation

12/30/2021
by   Hailin Zhang, et al.
0

Knowledge distillation is initially introduced to utilize additional supervision from a single teacher model for the student model training. To boost the student performance, some recent variants attempt to exploit diverse knowledge sources from multiple teachers. However, existing studies mainly integrate knowledge from diverse sources by averaging over multiple teacher predictions or combining them using other various label-free strategies, which may mislead student in the presence of low-quality teacher predictions. To tackle this problem, we propose Confidence-Aware Multi-teacher Knowledge Distillation (CA-MKD), which adaptively assigns sample-wise reliability for each teacher prediction with the help of ground-truth labels, with those teacher predictions close to one-hot labels assigned large weights. Besides, CA-MKD incorporates intermediate layers to further improve student performance. Extensive experiments show that our CA-MKD consistently outperforms all compared state-of-the-art methods across various teacher-student architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2022

Deeply-Supervised Knowledge Distillation

Knowledge distillation aims to enhance the performance of a lightweight ...
research
11/18/2019

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Knowledge distillation (KD) is widely used for training a compact model ...
research
08/07/2023

Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation

Temporal Sentence Grounding in Videos (TSGV) aims to detect the event ti...
research
08/02/2019

Distilling Knowledge From a Deep Pose Regressor Network

This paper presents a novel method to distill knowledge from a deep pose...
research
02/07/2022

ALM-KD: Knowledge Distillation with noisy labels via adaptive loss mixing

Knowledge distillation is a technique where the outputs of a pretrained ...
research
09/15/2020

Noisy Self-Knowledge Distillation for Text Summarization

In this paper we apply self-knowledge distillation to text summarization...
research
03/30/2022

Monitored Distillation for Positive Congruent Depth Completion

We propose a method to infer a dense depth map from a single image, its ...

Please sign up or login with your details

Forgot password? Click here to reset