Matching Guided Distillation

08/23/2020
by   Kaiyu Yue, et al.
15

Feature distillation is an effective way to improve the performance for a smaller student model, which has fewer parameters and lower computation cost compared to the larger teacher model. Unfortunately, there is a common obstacle - the gap in semantic feature structure between the intermediate features of teacher and student. The classic scheme prefers to transform intermediate features by adding the adaptation module, such as naive convolutional, attention-based or more complicated one. However, this introduces two problems: a) The adaptation module brings more parameters into training. b) The adaptation module with random initialization or special transformation isn't friendly for distilling a pre-trained student. In this paper, we present Matching Guided Distillation (MGD) as an efficient and parameter-free manner to solve these problems. The key idea of MGD is to pose matching the teacher channels with students' as an assignment problem. We compare three solutions of the assignment problem to reduce channels from teacher features with partial distillation loss. The overall training takes a coordinate-descent approach between two optimization objects - assignments update and parameters update. Since MGD only contains normalization or pooling operations with negligible computation cost, it is flexible to plug into network with other distillation methods.

READ FULL TEXT

page 17

page 18

research
05/28/2022

Parameter-Efficient and Student-Friendly Knowledge Distillation

Knowledge distillation (KD) has been extensively employed to transfer th...
research
11/07/2017

Moonshine: Distilling with Cheap Convolutions

Model distillation compresses a trained machine learning model, such as ...
research
05/23/2023

NORM: Knowledge Distillation via N-to-One Representation Matching

Existing feature distillation methods commonly adopt the One-to-one Repr...
research
11/25/2022

Privileged Prior Information Distillation for Image Matting

Performance of trimap-free image matting methods is limited when trying ...
research
02/05/2021

Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching

Knowledge distillation extracts general knowledge from a pre-trained tea...
research
04/03/2019

A Comprehensive Overhaul of Feature Distillation

We investigate the design aspects of feature distillation methods achiev...
research
12/31/2022

Guided Hybrid Quantization for Object detection in Multimodal Remote Sensing Imagery via One-to-one Self-teaching

Considering the computation complexity, we propose a Guided Hybrid Quant...

Please sign up or login with your details

Forgot password? Click here to reset