Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks

05/24/2019
by   Se Jung Kwon, et al.
0

Model compression techniques, such as pruning and quantization, are becoming increasingly important to reduce the memory footprints and the amount of computations. Despite model size reduction, achieving performance enhancement on devices is, however, still challenging mainly due to the irregular representations of sparse matrix formats. This paper proposes a new representation to encode the weights of Sparse Quantized Neural Networks, specifically reduced by find-grained and unstructured pruning method. The representation is encoded in a structured regular format, which can be efficiently decoded through XOR gates during inference in a parallel manner. We demonstrate various deep learning models that can be compressed and represented by our proposed format with fixed and high compression ratio. For example, for fully-connected layers of AlexNet on ImageNet dataset, we can represent the sparse weights by only 0.09 bits/weight for 1-bit quantization and 91% pruning rate with a fixed decoding rate and full memory bandwidth usage.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2018

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to re...
research
11/19/2019

DARB: A Density-Aware Regular-Block Pruning for Deep Neural Networks

The rapidly growing parameter volume of deep neural networks (DNNs) hind...
research
05/26/2021

Dynamic Probabilistic Pruning: A general framework for hardware-constrained pruning at different granularities

Unstructured neural network pruning algorithms have achieved impressive ...
research
08/26/2020

SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference

In recent years, there has been a flurry of research in deep neural netw...
research
11/01/2017

Efficient Inferencing of Compressed Deep Neural Networks

Large number of weights in deep neural networks makes the models difficu...
research
08/20/2020

Utilizing Explainable AI for Quantization and Pruning of Deep Neural Networks

For many applications, utilizing DNNs (Deep Neural Networks) requires th...
research
03/29/2021

Deep Compression for PyTorch Model Deployment on Microcontrollers

Neural network deployment on low-cost embedded systems, hence on microco...

Please sign up or login with your details

Forgot password? Click here to reset