Network Recasting: A Universal Method for Network Architecture Transformation

09/14/2018
by   Joonsang Yu, et al.
0

This paper proposes network recasting as a general method for network architecture transformation. The primary goal of this method is to accelerate the inference process through the transformation, but there can be many other practical applications. The method is based on block-wise recasting; it recasts each source block in a pre-trained teacher network to a target block in a student network. For the recasting, a target block is trained such that its output activation approximates that of the source block. Such a block-by-block recasting in a sequential manner transforms the network architecture while preserving the accuracy. This method can be used to transform an arbitrary teacher network type to an arbitrary student network type. It can even generate a mixed-architecture network that consists of two or more types of block. The network recasting can generate a network with fewer parameters and/or activations, which reduce the inference time significantly. Naturally, it can be used for network compression by recasting a trained network into a smaller network of the same type. Our experiments show that it outperforms previous compression approaches in terms of actual speedup on a GPU.

READ FULL TEXT
research
10/02/2018

LIT: Block-wise Intermediate Representation Training for Model Compression

Knowledge distillation (KD) is a popular method for reducing the computa...
research
01/30/2020

Search for Better Students to Learn Distilled Knowledge

Knowledge Distillation, as a model compression technique, has received g...
research
07/30/2018

Robust Student Network Learning

Deep neural networks bring in impressive accuracy in various application...
research
02/14/2018

Paraphrasing Complex Network: Network Compression via Factor Transfer

Deep neural networks (DNN) have recently shown promising performances in...
research
10/26/2019

Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework

The holy grail in deep neural network research is porting the memory- an...
research
09/18/2017

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

While bigger and deeper neural network architectures continue to advance...
research
11/10/2022

H E Stain Normalization using U-Net

We propose a novel hematoxylin and eosin (H E) stain normalization met...

Please sign up or login with your details

Forgot password? Click here to reset