Information Theoretic Representation Distillation

12/01/2021
by   Roy Miles, et al.
13

Despite the empirical success of knowledge distillation, there still lacks a theoretical foundation that can naturally lead to computationally inexpensive implementations. To address this concern, we forge an alternative connection between information theory and knowledge distillation using a recently proposed entropy-like functional. In doing so, we introduce two distinct complementary losses which aim to maximise the correlation and mutual information between the student and teacher representations. Our method achieves competitive performance to state-of-the-art on the knowledge distillation and cross-model transfer tasks, while incurring significantly less training overheads than closely related and similarly performing approaches. We further demonstrate the effectiveness of our method on a binary distillation task, whereby we shed light to a new state-of-the-art for binary quantisation. The code, evaluation protocols, and trained models will be publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2023

Deep Collective Knowledge Distillation

Many existing studies on knowledge distillation have focused on methods ...
research
10/23/2019

Contrastive Representation Distillation

Often we wish to transfer representational knowledge from one neural net...
research
10/25/2021

MUSE: Feature Self-Distillation with Mutual Information and Self-Information

We present a novel information-theoretic approach to introduce dependenc...
research
05/15/2018

Knowledge Distillation with Adversarial Samples Supporting Decision Boundary

Many recent works on knowledge distillation have provided ways to transf...
research
10/29/2018

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

The convergence rate and final performance of common deep learning model...
research
06/12/2019

Efficient Evaluation-Time Uncertainty Estimation by Improved Distillation

In this work we aim to obtain computationally-efficient uncertainty esti...
research
09/17/2021

Distilling Linguistic Context for Language Model Compression

A computationally expensive and memory intensive neural network lies beh...

Please sign up or login with your details

Forgot password? Click here to reset