Lipschitz Continuity Guided Knowledge Distillation

08/29/2021
by   Yuzhang Shang, et al.
0

Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones. Although great success has been achieved by prior distillation methods via delicately designing various types of knowledge, they overlook the functional properties of neural networks, which makes the process of applying those techniques to new tasks unreliable and non-trivial. To alleviate such problem, in this paper, we initially leverage Lipschitz continuity to better represent the functional characteristic of neural networks and guide the knowledge distillation process. In particular, we propose a novel Lipschitz Continuity Guided Knowledge Distillation framework to faithfully distill knowledge by minimizing the distance between two neural networks' Lipschitz constants, which enables teacher networks to better regularize student networks and improve the corresponding performance. We derive an explainable approximation algorithm with an explicit theoretical derivation to address the NP-hard problem of calculating the Lipschitz constant. Experimental results have shown that our method outperforms other benchmarks over several knowledge distillation tasks (e.g., classification, segmentation and object detection) on CIFAR-100, ImageNet, and PASCAL VOC datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2023

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge f...
research
02/26/2021

PURSUhInT: In Search of Informative Hint Points Based on Layer Clustering for Knowledge Distillation

We propose a novel knowledge distillation methodology for compressing de...
research
11/30/2022

Hint-dynamic Knowledge Distillation

Knowledge Distillation (KD) transfers the knowledge from a high-capacity...
research
12/01/2020

Solvable Model for Inheriting the Regularization through Knowledge Distillation

In recent years the empirical success of transfer learning with neural n...
research
07/13/2022

Lipschitz Continuity Retained Binary Neural Network

Relying on the premise that the performance of a binary neural network c...
research
10/04/2022

Learning Deep Nets for Gravitational Dynamics with Unknown Disturbance through Physical Knowledge Distillation: Initial Feasibility Study

Learning high-performance deep neural networks for dynamic modeling of h...
research
02/12/2021

Semantically-Conditioned Negative Samples for Efficient Contrastive Learning

Negative sampling is a limiting factor w.r.t. the generalization of metr...

Please sign up or login with your details

Forgot password? Click here to reset