PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation

06/25/2021
by   Jangho Kim, et al.
0

As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a novel model compression method for the devices with limited computational resources, called PQK consisting of pruning, quantization, and knowledge distillation (KD) processes. Unlike traditional pruning and KD, PQK makes use of unimportant weights pruned in the pruning process to make a teacher network for training a better student network without pre-training the teacher model. PQK has two phases. Phase 1 exploits iterative pruning and quantization-aware training to make a lightweight and power-efficient model. In phase 2, we make a teacher network by adding unimportant weights unused in phase 1 to a pruned network. By using this teacher network, we train the pruned network as a student network. In doing so, we do not need a pre-trained teacher network for the KD framework because the teacher and the student networks coexist within the same network. We apply our method to the recognition model and verify the effectiveness of PQK on keyword spotting (KWS) and image recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2019

QKD: Quantization-aware Knowledge Distillation

Quantization and Knowledge distillation (KD) methods are widely used to ...
research
09/30/2021

Prune Your Model Before Distill It

Unstructured pruning reduces a significant amount of weights of neural n...
research
10/14/2022

Lightweight Alpha Matting Network Using Distillation-Based Channel Pruning

Recently, alpha matting has received a lot of attention because of its u...
research
07/21/2023

Model Compression Methods for YOLOv5: A Review

Over the past few years, extensive research has been devoted to enhancin...
research
03/25/2021

Prototype-based Personalized Pruning

Nowadays, as edge devices such as smartphones become prevalent, there ar...
research
09/30/2022

Designing and Training of Lightweight Neural Networks on Edge Devices using Early Halting in Knowledge Distillation

Automated feature extraction capability and significant performance of D...
research
09/01/2016

Ternary Neural Networks for Resource-Efficient AI Applications

The computation and storage requirements for Deep Neural Networks (DNNs)...

Please sign up or login with your details

Forgot password? Click here to reset