DeepAI AI Chat
Log In Sign Up

Prime-Aware Adaptive Distillation

08/04/2020
by   Youcai Zhang, et al.
Megvii Technology Limited
Tongji University
USTC
FUDAN University
0

Knowledge distillation(KD) aims to improve the performance of a student network by mimicing the knowledge from a powerful teacher network. Existing methods focus on studying what knowledge should be transferred and treat all samples equally during training. This paper introduces the adaptive sample weighting to KD. We discover that previous effective hard mining methods are not appropriate for distillation. Furthermore, we propose Prime-Aware Adaptive Distillation (PAD) by the incorporation of uncertainty learning. PAD perceives the prime samples in distillation and then emphasizes their effect adaptively. PAD is fundamentally different from and would refine existing methods with the innovative view of unequal training. For this reason, PAD is versatile and has been applied in various tasks including classification, metric learning, and object detection. With ten teacher-student combinations on six datasets, PAD promotes the performance of existing distillation methods and outperforms recent state-of-the-art methods.

READ FULL TEXT
05/05/2022

Spot-adaptive Knowledge Distillation

Knowledge distillation (KD) has become a well established paradigm for c...
04/09/2019

Prime Sample Attention in Object Detection

It is a common paradigm in object detection frameworks to treat all samp...
01/26/2022

Adaptive Instance Distillation for Object Detection in Autonomous Driving

In recent years, knowledge distillation (KD) has been widely used as an ...
01/31/2023

AMD: Adaptive Masked Distillation for Object

As a general model compression paradigm, feature-based knowledge distill...
10/19/2021

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Knowledge Distillation is becoming one of the primary trends among neura...
06/12/2019

Efficient Evaluation-Time Uncertainty Estimation by Improved Distillation

In this work we aim to obtain computationally-efficient uncertainty esti...
02/14/2022

What is Next when Sequential Prediction Meets Implicitly Hard Interaction?

Hard interaction learning between source sequences and their next target...