Heterogeneous Bitwidth Binarization in Convolutional Neural Networks

05/25/2018
by   Josh Fromm, et al.
0

Recent work has shown that fast, compact low-bitwidth neural networks can be surprisingly accurate. These networks use homogeneous binarization: all parameters in each layer or (more commonly) the whole model have the same low bitwidth (e.g., 2 bits). However, modern hardware allows efficient designs where each arithmetic instruction can have a custom bitwidth, motivating heterogeneous binarization, where every parameter in the network may have a different bitwidth. In this paper, we show that it is feasible and useful to select bitwidths at the parameter granularity during training. For instance a heterogeneously quantized version of modern networks such as AlexNet and MobileNet, with the right mix of 1-, 2- and 3-bit parameters that average to just 1.4 bits can equal the accuracy of homogeneous 2-bit versions of these networks. Further, we provide analyses to show that the heterogeneously binarized systems yield FPGA- and ASIC-based implementations that are correspondingly more efficient in both circuit area and energy efficiency than their homogeneous counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2022

A Silicon Photonic Accelerator for Convolutional Neural Networks with Heterogeneous Quantization

Parameter quantization in convolutional neural networks (CNNs) can help ...
research
09/22/2016

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

We introduce a method to train Quantized Neural Networks (QNNs) --- neur...
research
11/01/2017

Minimum Energy Quantized Neural Networks

This work targets the automated minimum-energy optimization of Quantized...
research
11/19/2019

AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers

Low-precision arithmetic operations to accelerate deep-learning applicat...
research
02/26/2020

Quantized Neural Network Inference with Precision Batching

We present PrecisionBatching, a quantized inference algorithm for speedi...
research
08/26/2016

Scalable Compression of Deep Neural Networks

Deep neural networks generally involve some layers with mil- lions of pa...
research
03/28/2018

An Efficient I/O Architecture for RAM-based Content-Addressable Memory on FPGA

Despite the impressive search rate of one key per clock cycle, the updat...

Please sign up or login with your details

Forgot password? Click here to reset