Streamlined Deployment for Quantized Neural Networks

09/12/2017
by   Yaman Umuroglu, et al.
0

Running Deep Neural Network (DNN) models on devices with limited computational capability is a challenge due to large compute and memory requirements. Quantized Neural Networks (QNNs) have emerged as a potential solution to this problem, promising to offer most of the DNN accuracy benefits with much lower computational cost. However, harvesting these benefits on existing mobile CPUs is a challenge since operations on highly quantized datatypes are not natively supported in most instruction set architectures (ISAs). In this work, we first describe a streamlining flow to convert all QNN inference operations to integer ones. Afterwards, we provide techniques based on processing one bit position at a time (bit-serial) to show how QNNs can be efficiently deployed using common bitwise operations. We demonstrate the potential of QNNs on mobile CPUs with microbenchmarks and on a quantized AlexNet, which is 3.5x faster than an optimized 8-bit baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2020

Fast Implementation of 4-bit Convolutional Neural Networks for Mobile Devices

Quantized low-precision neural networks are very popular because they re...
research
05/09/2021

RBNN: Memory-Efficient Reconfigurable Deep Binary Neural Network with IP Protection for Internet of Things

Though deep neural network models exhibit outstanding performance for va...
research
08/23/2021

On the Acceleration of Deep Neural Network Inference using Quantized Compressed Sensing

Accelerating deep neural network (DNN) inference on resource-limited dev...
research
05/27/2021

Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch Normalization

Huge computational costs brought by convolution and batch normalization ...
research
02/12/2023

Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference

In this paper, we present Quark, an integer RISC-V vector processor spec...
research
04/23/2020

Quantaized Winograd/Toom-Cook Convolution for DNNs: Beyond Canonical Polynomials Base

The problem how to speed up the convolution computations in Deep Neural ...
research
10/01/2019

NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques

Quantization has emerged to be an effective way to significantly boost t...

Please sign up or login with your details

Forgot password? Click here to reset