Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations

10/15/2021
by   Xinyu Zhang, et al.
0

Quantization and pruning are core techniques used to reduce the inference costs of deep neural networks. State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often applied to only the weights of the network. In this work, we jointly apply novel uniform quantization and unstructured pruning methods to both the weights and activations of deep neural networks during training. Using our methods, we empirically evaluate the currently accepted prune-then-quantize paradigm across a wide range of computer vision tasks and observe a non-commutative nature when applied to both the weights and activations of deep neural networks. Informed by these observations, we articulate the non-commutativity hypothesis: for a given deep neural network being trained for a specific task, there exists an exact training schedule in which quantization and pruning can be introduced to optimize network performance. We identify that this optimal ordering not only exists, but also varies across discriminative and generative tasks. Using the optimal training schedule within our training framework, we demonstrate increased performance per memory footprint over existing solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2018

UNIQ: Uniform Noise Injection for the Quantization of Neural Networks

We present a novel method for training deep neural network amenable to i...
research
05/16/2016

Reducing the Model Order of Deep Neural Networks Using Information Theory

Deep neural networks are typically represented by a much larger number o...
research
07/06/2023

Pruning vs Quantization: Which is Better?

Neural network pruning and quantization techniques are almost as old as ...
research
07/19/2020

DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks

Deep neural networks have achieved state-of-the art performance on vario...
research
06/22/2020

Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning

As deep neural networks are growing in size and being increasingly deplo...
research
02/24/2022

Standard Deviation-Based Quantization for Deep Neural Networks

Quantization of deep neural networks is a promising approach that reduce...
research
09/20/2023

SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization

Quantization is a widely used compression method that effectively reduce...

Please sign up or login with your details

Forgot password? Click here to reset