Knowledge Squeezed Adversarial Network Compression

04/10/2019
by   Shu Changyong, et al.
0

Deep network compression has been achieved notable progress via knowledge distillation, where a teacher-student learning manner is adopted by using predetermined loss. Recently, more focuses have been transferred to employ the adversarial training to minimize the discrepancy between distributions of output from two networks. However, they always emphasize on result-oriented learning while neglecting the scheme of process-oriented learning, leading to the loss of rich information contained in the whole network pipeline. Inspired by the assumption that, the small network can not perfectly mimic a large one due to the huge gap of network scale, we propose a knowledge transfer method, involving effective intermediate supervision, under the adversarial training framework to learn the student network. To achieve powerful but highly compact intermediate information representation, the squeezed knowledge is realized by task-driven attention mechanism. Then, the transferred knowledge from teacher network could accommodate the size of student network. As a result, the proposed method integrates merits from both process-oriented and result-oriented learning. Extensive experimental results on three typical benchmark datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, demonstrate that our method achieves highly superior performances against other state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2023

Student-friendly Knowledge Distillation

In knowledge distillation, the knowledge from the teacher model is often...
research
10/18/2018

KTAN: Knowledge Transfer Adversarial Network

To reduce the large computation and storage cost of a deep convolutional...
research
03/28/2018

Adversarial Network Compression

Neural network compression has recently received much attention due to t...
research
06/05/2021

Bidirectional Distillation for Top-K Recommender System

Recommender systems (RS) have started to employ knowledge distillation, ...
research
12/25/2022

BD-KD: Balancing the Divergences for Online Knowledge Distillation

Knowledge distillation (KD) has gained a lot of attention in the field o...
research
10/02/2018

LIT: Block-wise Intermediate Representation Training for Model Compression

Knowledge distillation (KD) is a popular method for reducing the computa...
research
10/25/2022

Accelerating Certified Robustness Training via Knowledge Transfer

Training deep neural network classifiers that are certifiably robust aga...

Please sign up or login with your details

Forgot password? Click here to reset