Local Region Knowledge Distillation

10/09/2020
by   Xiang Deng, et al.
0

Knowledge distillation (KD) is an effective technique to transfer knowledge from one neural network (teacher) to another (student), thus improving the performance of the student. The existing work trains the student to mimic the outputs of the teacher on training data. We argue that transferring knowledge at sparse training data points cannot enable the student to well capture the local shape of the teacher's function. To address this issue, we propose locally linear region knowledge distillation (L^2RKD) which transfers the knowledge in local, liner regions from a teacher to a student. L^2RKD enforces the student to mimic the local shape of the teacher function in linear regions. Extensive experiments with various network architectures demonstrate that L^2RKD outperforms the state-of-the-art approaches by a large margin and is more data-efficient. Moreover, L^2RKD is compatible with the existing distillation methods and further improves their performances significantly.

READ FULL TEXT
research
04/24/2023

Improving Knowledge Distillation Via Transferring Learning Ability

Existing knowledge distillation methods generally use a teacher-student ...
research
04/19/2020

Role-Wise Data Augmentation for Knowledge Distillation

Knowledge Distillation (KD) is a common method for transferring the “kno...
research
04/11/2019

Variational Information Distillation for Knowledge Transfer

Transferring knowledge from a teacher neural network pretrained on the s...
research
11/08/2019

Deep geometric knowledge distillation with graphs

In most cases deep learning architectures are trained disregarding the a...
research
11/25/2022

Privileged Prior Information Distillation for Image Matting

Performance of trimap-free image matting methods is limited when trying ...
research
01/06/2022

Contrastive Neighborhood Alignment

We present Contrastive Neighborhood Alignment (CNA), a manifold learning...
research
02/23/2022

Are All Linear Regions Created Equal?

The number of linear regions has been studied as a proxy of complexity f...

Please sign up or login with your details

Forgot password? Click here to reset