Variational Information Distillation for Knowledge Transfer

04/11/2019
by   Sungsoo Ahn, et al.
6

Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match the activations or the corresponding hand-crafted features of the teacher and the student networks. We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks. We compare our method with existing knowledge transfer methods on both knowledge distillation and transfer learning tasks and show that our method consistently outperforms existing methods. We further demonstrate the strength of our method on knowledge transfer across heterogeneous network architectures by transferring knowledge from a convolutional neural network (CNN) to a multi-layer perceptron (MLP) on CIFAR-10. The resulting MLP significantly outperforms the-state-of-the-art methods and it achieves similar performance to the CNN with a single convolutional layer.

READ FULL TEXT

page 4

page 14

research
10/09/2020

Local Region Knowledge Distillation

Knowledge distillation (KD) is an effective technique to transfer knowle...
research
03/01/2018

Knowledge Transfer with Jacobian Matching

Classical distillation methods transfer representations from a "teacher"...
research
12/12/2016

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer

Attention plays a critical role in human visual experience. Furthermore,...
research
10/25/2021

MUSE: Feature Self-Distillation with Mutual Information and Self-Information

We present a novel information-theoretic approach to introduce dependenc...
research
10/29/2021

Estimating and Maximizing Mutual Information for Knowledge Distillation

In this work, we propose Mutual Information Maximization Knowledge Disti...
research
03/28/2018

Probabilistic Knowledge Transfer for Deep Representation Learning

Knowledge Transfer (KT) techniques tackle the problem of transferring th...
research
05/02/2020

Heterogeneous Knowledge Distillation using Information Flow Modeling

Knowledge Distillation (KD) methods are capable of transferring the know...

Please sign up or login with your details

Forgot password? Click here to reset