Joint Pruning Quantization for Extremely Sparse Neural Networks

10/05/2020
by   Po-Hsiang Yu, et al.
0

We investigate pruning and quantization for deep neural networks. Our goal is to achieve extremely high sparsity for quantized networks to enable implementation on low cost and low power accelerator hardware. In a practical scenario, there are particularly many applications for dense prediction tasks, hence we choose stereo depth estimation as target. We propose a two stage pruning and quantization pipeline and introduce a Taylor Score alongside a new fine-tuning mode to achieve extreme sparsity without sacrificing performance. Our evaluation does not only show that pruning and quantization should be investigated jointly, but also shows that almost 99 cut while hardware costs can be reduced up to 99.9 with other works, we demonstrate that our pruning stage alone beats the state-of-the-art when applied to ResNet on CIFAR10 and ImageNet.

READ FULL TEXT

page 1

page 5

page 13

research
07/23/2021

Pruning Ternary Quantization

We propose pruning ternary quantization (PTQ), a simple, yet effective, ...
research
02/03/2020

Automatic Pruning for Quantized Neural Networks

Neural network quantization and pruning are two techniques commonly used...
research
03/13/2023

Bag of Tricks with Quantized Convolutional Neural Networks for image classification

Deep neural networks have been proven effective in a wide range of tasks...
research
07/20/2020

Differentiable Joint Pruning and Quantization for Hardware Efficiency

We present a differentiable joint pruning and quantization (DJPQ) scheme...
research
07/06/2023

Pruning vs Quantization: Which is Better?

Neural network pruning and quantization techniques are almost as old as ...
research
06/12/2019

Parameterized Structured Pruning for Deep Neural Networks

As a result of the growing size of Deep Neural Networks (DNNs), the gap ...
research
12/28/2021

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

This work is focused on the pruning of some convolutional neural network...

Please sign up or login with your details

Forgot password? Click here to reset