Light Multi-segment Activation for Model Compression

07/16/2019
by   Zhenhui Xu, et al.
0

Model compression has become necessary when applying neural networks (NN) into many real application tasks that can accept slightly-reduced model accuracy with strict tolerance to model complexity. Recently, Knowledge Distillation, which distills the knowledge from well-trained and highly complex teacher model into a compact student model, has been widely used for model compression. However, under the strict requirement on the resource cost, it is quite challenging to achieve comparable performance with the teacher model, essentially due to the drastically-reduced expressiveness ability of the compact student model. Inspired by the nature of the expressiveness ability in Neural Networks, we propose to use multi-segment activation, which can significantly improve the expressiveness ability with very little cost, in the compact student model. Specifically, we propose a highly efficient multi-segment activation, called Light Multi-segment Activation (LMA), which can rapidly produce multiple linear regions with very few parameters by leveraging the statistical information. With using LMA, the compact student model is capable of achieving much better performance effectively and efficiently, than the ReLU-equipped one with same model scale. Furthermore, the proposed method is compatible with other model compression techniques, such as quantization, which means they can be used jointly for better compression performance. Experiments on state-of-the-art NN architectures over the real-world tasks demonstrate the effectiveness and extensibility of the LMA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2020

DE-RRD: A Knowledge Distillation Framework for Recommender System

Recent recommender systems have started to employ knowledge distillation...
research
08/08/2023

Teacher-Student Architecture for Knowledge Distillation: A Survey

Although Deep neural networks (DNNs) have shown a strong capacity to sol...
research
01/31/2018

Model compression for faster structural separation of macromolecules captured by Cellular Electron Cryo-Tomography

Electron Cryo-Tomography (ECT) enables 3D visualization of macromolecule...
research
03/16/2023

Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on Limit-Trained Multi-Teacher Models

With numerous medical tasks, the performance of deep models has recently...
research
11/21/2019

Few Shot Network Compression via Cross Distillation

Model compression has been widely adopted to obtain light-weighted deep ...
research
10/26/2019

Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework

The holy grail in deep neural network research is porting the memory- an...

Please sign up or login with your details

Forgot password? Click here to reset