Learning Student Networks via Feature Embedding

12/17/2018
by   Hanting Chen, et al.
8

Deep convolutional neural networks have been widely used in numerous applications, but their demanding storage and computational resource requirements prevent their applications on mobile devices. Knowledge distillation aims to optimize a portable student network by taking the knowledge from a well-trained heavy teacher network. Traditional teacher-student based methods used to rely on additional fully-connected layers to bridge intermediate layers of teacher and student networks, which brings in a large number of auxiliary parameters. In contrast, this paper aims to propagate information from teacher to student without introducing new variables which need to be optimized. We regard the teacher-student paradigm from a new perspective of feature embedding. By introducing the locality preserving loss, the student network is encouraged to generate the low-dimensional features which could inherit intrinsic properties of their corresponding high-dimensional features from teacher network. The resulting portable network thus can naturally maintain the performance as that of the teacher network. Theoretical analysis is provided to justify the lower computation complexity of the proposed method. Experiments on benchmark datasets and well-trained networks suggest that the proposed algorithm is superior to state-of-the-art teacher-student learning methods in terms of computational and storage complexity.

READ FULL TEXT

page 1

page 2

research
09/02/2017

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

There is an increasing interest on accelerating neural networks for real...
research
12/05/2020

Multi-head Knowledge Distillation for Model Compression

Several methods of knowledge distillation have been developed for neural...
research
07/30/2018

Robust Student Network Learning

Deep neural networks bring in impressive accuracy in various application...
research
03/17/2016

Do Deep Convolutional Nets Really Need to be Deep and Convolutional?

Yes, they do. This paper provides the first empirical demonstration that...
research
09/28/2018

A theoretical framework for deep locally connected ReLU network

Understanding theoretical properties of deep and locally connected nonli...
research
01/23/2022

ASCNet: Action Semantic Consistent Learning of Arbitrary Progress Levels for Early Action Prediction

Early action prediction aims to recognize human actions from only a part...
research
12/25/2019

Learning performance in inverse Ising problems with sparse teacher couplings

We investigate the learning performance of the pseudolikelihood maximiza...

Please sign up or login with your details

Forgot password? Click here to reset