Ternary MobileNets via Per-Layer Hybrid Filter Banks

11/04/2019
by   Dibakar Gope, et al.
0

MobileNets family of computer vision neural networks have fueled tremendous progress in the design and organization of resource-efficient architectures in recent years. New applications with stringent real-time requirements on highly constrained devices require further compression of MobileNets-like already compute-efficient networks. Model quantization is a widely used technique to compress and accelerate neural network inference and prior works have quantized MobileNets to 4-6 bits albeit with a modest to significant drop in accuracy. While quantization to sub-byte values (i.e. precision less than or equal to 8 bits) has been valuable, even further quantization of MobileNets to binary or ternary values is necessary to realize significant energy savings and possibly runtime speedups on specialized hardware, such as ASICs and FPGAs. Under the key observation that convolutional filters at each layer of a deep neural network may respond differently to ternary quantization, we propose a novel quantization method that generates per-layer hybrid filter banks consisting of full-precision and ternary weight filters for MobileNets. The layer-wise hybrid filter banks essentially combine the strengths of full-precision and ternary weight filters to derive a compact, energy-efficient architecture for MobileNets. Using this proposed quantization method, we quantized a substantial portion of weight filters of MobileNets to ternary values resulting in 27.98 savings in energy, and a 51.07 comparable accuracy and no degradation in throughput on specialized hardware in comparison to the baseline full-precision MobileNets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2020

Least squares binary quantization of neural networks

Quantizing weights and activations of deep neural networks results in si...
research
01/30/2023

Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference

For effective and efficient deep neural network inference, it is desirab...
research
08/13/2020

Weight Equalizing Shift Scaler-Coupled Post-training Quantization

Post-training, layer-wise quantization is preferable because it is free ...
research
01/12/2021

Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks

As neural networks gain widespread adoption in embedded devices, there i...
research
06/23/2017

Further Study on GFR Features for JPEG Steganalysis

The GFR (Gabor Filter Residual) features, built as histograms of quantiz...
research
02/01/2019

Efficient Hybrid Network Architectures for Extremely Quantized Neural Networks Enabling Intelligence at the Edge

The recent advent of `Internet of Things' (IOT) has increased the demand...
research
05/18/2020

Cross-filter compression for CNN inference acceleration

Convolution neural network demonstrates great capability for multiple ta...

Please sign up or login with your details

Forgot password? Click here to reset