A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

03/23/2023
by   Ziwei Liu, et al.
0

Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may enforce overly strict constraints on the student thus degrade the performance of the student model. To alleviate the above feature misalignment issue, existing works mainly focus on spatially aligning the feature maps of the teacher and the student, with pixel-wise transformation. In this paper, we newly find that aligning the feature maps between teacher and student along the channel-wise dimension is also effective for addressing the feature misalignment issue. Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model. Based on it, we further propose a simple and generic framework for feature distillation, with only one hyper-parameter to balance the distillation loss and the task specific loss. Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28 ImageNet-1K), object detection (+3.9 on MS COCO), instance segmentation (+2.8 Mask-RCNN), and semantic segmentation (+4.66 semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. The code will be made publicly available.

READ FULL TEXT
research
04/03/2019

A Comprehensive Overhaul of Feature Distillation

We investigate the design aspects of feature distillation methods achiev...
research
11/01/2022

Pixel-Wise Contrastive Distillation

We present the first pixel-level self-supervised distillation framework ...
research
07/12/2022

Normalized Feature Distillation for Semantic Segmentation

As a promising approach in model compression, knowledge distillation imp...
research
02/16/2021

Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model

Teacher-student models provide a powerful framework in which the typical...
research
10/31/2019

Distilling Pixel-Wise Feature Similarities for Semantic Segmentation

Among the neural network compression techniques, knowledge distillation ...
research
07/10/2020

Distillation Guided Residual Learning for Binary Convolutional Neural Networks

It is challenging to bridge the performance gap between Binary CNN (BCNN...
research
03/12/2019

Knowledge Adaptation for Efficient Semantic Segmentation

Both accuracy and efficiency are of significant importance to the task o...

Please sign up or login with your details

Forgot password? Click here to reset