Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

12/05/2020
by   Cody Blakeney, et al.
0

Deep neural networks (DNNs) have been extremely successful in solving many challenging AI tasks in natural language processing, speech recognition, and computer vision nowadays. However, DNNs are typically computation intensive, memory demanding, and power hungry, which significantly limits their usage on platforms with constrained resources. Therefore, a variety of compression techniques (e.g. quantization, pruning, and knowledge distillation) have been proposed to reduce the size and power consumption of DNNs. Blockwise knowledge distillation is one of the compression techniques that can effectively reduce the size of a highly complex DNN. However, it is not widely adopted due to its long training time. In this paper, we propose a novel parallel blockwise distillation algorithm to accelerate the distillation process of sophisticated DNNs. Our algorithm leverages local information to conduct independent blockwise distillation, utilizes depthwise separable layers as the efficient replacement block architecture, and properly addresses limiting factors (e.g. dependency, synchronization, and load balancing) that affect parallelism. The experimental results running on an AMD server with four Geforce RTX 2080Ti GPUs show that our algorithm can achieve 3x speedup plus 19 distillation, and 3.5x speedup plus 29 both with negligible accuracy loss. The speedup of ResNet distillation can be further improved to 3.87 when using four RTX6000 GPUs in a distributed cluster.

READ FULL TEXT

page 6

page 11

page 12

research
08/20/2022

Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks

Quantization, knowledge distillation, and magnitude pruning are among th...
research
03/24/2020

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Deep neural networks (DNNs) are successful in many computer vision tasks...
research
11/15/2017

Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy

Deep learning networks have achieved state-of-the-art accuracies on comp...
research
05/14/2023

Analyzing Compression Techniques for Computer Vision

Compressing deep networks is highly desirable for practical use-cases in...
research
05/31/2023

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Automatic Speech Recognition (ASR) has seen remarkable advancements with...
research
12/17/2019

Joint Architecture and Knowledge Distillation in Convolutional Neural Network for Offline Handwritten Chinese Text Recognition

The technique of distillation helps transform cumbersome neural network ...
research
04/21/2022

Eliminating Backdoor Triggers for Deep Neural Networks Using Attention Relation Graph Distillation

Due to the prosperity of Artificial Intelligence (AI) techniques, more a...

Please sign up or login with your details

Forgot password? Click here to reset