Efficient Inferencing of Compressed Deep Neural Networks

11/01/2017
by   Dharma Teja Vooturi, et al.
0

Large number of weights in deep neural networks makes the models difficult to be deployed in low memory environments such as, mobile phones, IOT edge devices as well as "inferencing as a service" environments on cloud. Prior work has considered reduction in the size of the models, through compression techniques like pruning, quantization, Huffman encoding etc. However, efficient inferencing using the compressed models has received little attention, specially with the Huffman encoding in place. In this paper, we propose efficient parallel algorithms for inferencing of single image and batches, under various memory constraints. Our experimental results show that our approach of using variable batch size for inferencing achieves 15-25% performance improvement in the inference throughput for AlexNet, while maintaining memory and latency constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2017

Weightless: Lossy Weight Encoding For Deep Neural Network Compression

The large memory requirements of deep neural networks limit their deploy...
research
05/24/2019

Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks

Model compression techniques, such as pruning and quantization, are beco...
research
02/04/2020

Lightweight Convolutional Representations for On-Device Natural Language Processing

The increasing computational and memory complexities of deep neural netw...
research
03/12/2022

A Mixed Quantization Network for Computationally Efficient Mobile Inverse Tone Mapping

Recovering a high dynamic range (HDR) image from a single low dynamic ra...
research
05/20/2019

Compressed Learning of Deep Neural Networks for OpenCL-Capable Embedded Systems

Deep neural networks (DNNs) have been quite successful in solving many c...
research
10/02/2020

GECKO: Reconciling Privacy, Accuracy and Efficiency in Embedded Deep Learning

Embedded systems demand on-device processing of data using Neural Networ...
research
10/06/2020

Characterising Bias in Compressed Models

The popularity and widespread use of pruning and quantization is driven ...

Please sign up or login with your details

Forgot password? Click here to reset