Optimizing data-flow in Binary Neural Networks

04/03/2023
by   L. Vorabbi, et al.
0

Binary Neural Networks (BNNs) can significantly accelerate the inference time of a neural network by replacing its expensive floating-point arithmetic with bitwise operations. Most existing solutions, however, do not fully optimize data flow through the BNN layers, and intermediate conversions from 1 to 16/32 bits often further hinder efficiency. We propose a novel training scheme that can increase data flow and parallelism in the BNN pipeline; specifically, we introduce a clipping block that decreases the data-width from 32 bits to 8. Furthermore, we reduce the internal accumulator size of a binary layer, usually kept using 32-bit to prevent data overflow without losing accuracy. Additionally, we provide an optimization of the Batch Normalization layer that both reduces latency and simplifies deployment. Finally, we present an optimized implementation of the Binary Direct Convolution for ARM instruction sets. Our experiments show a consistent improvement of the inference speed (up to 1.91 and 2.73x compared to two state-of-the-art BNNs frameworks) with no drop in accuracy for at least one full-precision model.

READ FULL TEXT

page 9

page 11

research
06/20/2018

Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?

Binary neural networks (BNN) have been studied extensively since they ru...
research
07/11/2020

HOBFLOPS CNNs: Hardware Optimized Bitsliced Floating-Point Operations Convolutional Neural Networks

Convolutional neural network (CNN) inference is commonly performed with ...
research
08/16/2019

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices

It is always well believed that Binary Neural Networks (BNNs) could dras...
research
11/18/2020

Larq Compute Engine: Design, Benchmark, and Deploy State-of-the-Art Binarized Neural Networks

We introduce Larq Compute Engine, the world's fastest Binarized Neural N...
research
04/24/2020

Quantization of Deep Neural Networks for Accumulator-constrained Processors

We introduce an Artificial Neural Network (ANN) quantization methodology...
research
10/18/2021

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

In the low-bit quantization field, training Binary Neural Networks (BNNs...
research
10/01/2020

BCNN: A Binary CNN with All Matrix Ops Quantized to 1 Bit Precision

This paper describes a CNN where all CNN style 2D convolution operations...

Please sign up or login with your details

Forgot password? Click here to reset