Hybrid Binary Networks: Optimizing for Accuracy, Efficiency and Memory

04/11/2018
by   Ameya Prabhu, et al.
0

Binarization is an extreme network compression approach that provides large computational speedups along with energy and memory savings, albeit at significant accuracy costs. We investigate the question of where to binarize inputs at layer-level granularity and show that selectively binarizing the inputs to specific layers in the network could lead to significant improvements in accuracy while preserving most of the advantages of binarization. We analyze the binarization tradeoff using a metric that jointly models the input binarization-error and computational cost and introduce an efficient algorithm to select layers whose inputs are to be binarized. Practical guidelines based on insights obtained from applying the algorithm to a variety of models are discussed. Experiments on Imagenet dataset using AlexNet and ResNet-18 models show 3-4 impact on compression and computational speed. The improvements are even more substantial on sketch datasets like TU-Berlin, where we match state-of-the-art accuracy as well, getting over 8 our approach can be applied in tandem with other forms of compression that deal with individual layers or overall model compression (e.g., SqueezeNets). Unlike previous quantization approaches, we are able to binarize the weights in the last layers of a network, which often have a large number of parameters, resulting in significant improvement in accuracy over fully binarized models.

READ FULL TEXT

page 1

page 13

page 14

page 15

page 16

research
04/09/2018

Distribution-Aware Binarization of Neural Networks for Sketch Recognition

Deep neural networks are highly effective at a range of computational ta...
research
06/15/2023

Neural Network Compression using Binarization and Few Full-Precision Weights

Quantization and pruning are known to be two effective Deep Neural Netwo...
research
01/13/2021

ABS: Automatic Bit Sharing for Model Compression

We present Automatic Bit Sharing (ABS) to automatically search for optim...
research
02/11/2018

FD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy

We present Fast-Downsampling MobileNet (FD-MobileNet), an efficient and ...
research
11/30/2018

A Framework for Fast and Efficient Neural Network Compression

Network compression reduces the computational complexity and memory cons...
research
01/15/2018

Deep Net Triage: Assessing the Criticality of Network Layers by Structural Compression

Deep network compression seeks to reduce the number of parameters in the...
research
07/12/2019

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

In this paper, we address the problem of reducing the memory footprint o...

Please sign up or login with your details

Forgot password? Click here to reset