Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs

06/12/2018
by   Philip Colangelo, et al.
0

CNNs have been shown to maintain reasonable classification accuracy when quantized to lower precisions. Quantizing to sub 8-bit activations and weights can result in accuracy falling below an acceptable threshold. Techniques exist for closing the accuracy gap of limited numeric precision typically by increasing computation. This results in a trade-off between throughput and accuracy and can be tailored for different networks through various combinations of activation and weight data widths. Hardware architectures like FPGAs provide the opportunity for data width specific computation through unique logic configurations leading to highly optimized processing that is unattainable by full precision networks. Ternary and binary weighted networks offer an efficient method of inference for 2-bit and 1-bit data respectively. Most hardware architectures can take advantage of the memory storage and bandwidth savings that come along with smaller datapaths, but very few architectures can take advantage of limited numeric precision at the computation level. In this paper, we present a hardware design for FPGAs that takes advantage of bandwidth, memory, power, and computation savings of limited numerical precision data. We provide insights into the trade-offs between throughput and accuracy for various networks and how they map to our framework. Further, we show how limited numeric precision computation can be efficiently mapped onto FPGAs for both ternary and binary cases. Starting with Arria 10, we show a 2-bit activation and ternary weighted AlexNet running in hardware that achieves 3,700 images per second on the ImageNet dataset with a top-1 accuracy of 0.49. Using a hardware modeler designed for our low numeric precision framework we project performance most notably for a 55.5 TOPS Stratix 10 device running a modified ResNet-34 with only 3.7 single precision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2022

Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA

Convolutional Neural Networks (CNNs) reach high accuracies in various ap...
research
07/06/2023

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Mixed-precision quantization, where a deep neural network's layers are q...
research
06/16/2023

Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference

Convolutional Neural Networks (CNNs) are used in a wide range of applica...
research
12/04/2019

RTN: Reparameterized Ternary Network

To deploy deep neural networks on resource-limited devices, quantization...
research
03/23/2019

BitSplit-Net: Multi-bit Deep Neural Network with Bitwise Activation Function

Significant computational cost and memory requirements for deep neural n...
research
09/11/2018

Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference

To realize the promise of ubiquitous embedded deep network inference, it...
research
12/01/2022

Exploiting Kernel Compression on BNNs

Binary Neural Networks (BNNs) are showing tremendous success on realisti...

Please sign up or login with your details

Forgot password? Click here to reset