Recurrent knowledge distillation

05/18/2018
by   Silvia L. Pintea, et al.
0

Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR-10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

ResKD: Residual-Guided Knowledge Distillation

Knowledge distillation has emerge as a promising technique for compressi...
research
02/26/2021

PURSUhInT: In Search of Informative Hint Points Based on Layer Clustering for Knowledge Distillation

We propose a novel knowledge distillation methodology for compressing de...
research
12/05/2018

Knowledge Distillation from Few Samples

Current knowledge distillation methods require full training data to dis...
research
05/18/2022

[Re] Distilling Knowledge via Knowledge Review

This effort aims to reproduce the results of experiments and analyze the...
research
12/03/2018

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

Knowledge distillation is an effective technique that transfers knowledg...
research
10/26/2019

Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework

The holy grail in deep neural network research is porting the memory- an...
research
11/13/2019

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

Despite the recent works on knowledge distillation (KD) have achieved a ...

Please sign up or login with your details

Forgot password? Click here to reset