Knowledge Distillation Layer that Lets the Student Decide

09/06/2023
by   Ada Gorgun, et al.
0

Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised by the teacher) that the corresponding region of the student matches. To facilitate template learning in the intermediate layers, we propose a novel form of supervision based on the teacher's decisions. Through rigorous experimentation, we demonstrate the effectiveness of our approach on 3 popular classification benchmarks. Code is available at: https://github.com/adagorgun/letKD-framework

READ FULL TEXT
research
06/11/2023

Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning

Multi-Teacher knowledge distillation provides students with additional s...
research
10/03/2022

Feature Embedding by Template Matching as a ResNet Block

Convolution blocks serve as local feature extractors and are the key to ...
research
05/05/2022

Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural Networks

Existing knowledge distillation methods on graph neural networks (GNNs) ...
research
05/23/2023

NORM: Knowledge Distillation via N-to-One Representation Matching

Existing feature distillation methods commonly adopt the One-to-one Repr...
research
02/16/2021

Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model

Teacher-student models provide a powerful framework in which the typical...
research
04/11/2020

Inter-Region Affinity Distillation for Road Marking Segmentation

We study the problem of distilling knowledge from a large deep teacher n...
research
04/24/2023

Function-Consistent Feature Distillation

Feature distillation makes the student mimic the intermediate features o...

Please sign up or login with your details

Forgot password? Click here to reset