Automating Generation of Low Precision Deep Learning Operators

10/25/2018
by   Meghan Cowan, et al.
0

State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and mobile devices poses a challenge due to their limited compute capabilities and strict energy budgets. One solution that has generated significant research interest is deploying highly quantized models that operate on low precision inputs and weights less than eight bits, trading off accuracy for performance. These models have a significantly reduced memory footprint (up to 32x reduction) and can replace multiply-accumulates with bitwise operations during compute intensive convolution and fully connected layers. Most deep learning frameworks rely on highly engineered linear algebra libraries such as ATLAS or Intel's MKL to implement efficient deep learning operators. To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries. In this paper we introduce a work flow to quickly generate high performance low precision deep learning operators for arbitrary precision that target multiple CPU architectures and include optimizations such as memory tiling and vectorization. We present an extensive case study on low power ARM Cortex-A53 CPU, and show how we can generate 1-bit, 2-bit convolutions with speedups up to 16x over an optimized 16-bit integer baseline and 2.3x better than handwritten implementations.

READ FULL TEXT
research
09/19/2023

DeepliteRT: Computer Vision at the Edge

The proliferation of edge devices has unlocked unprecedented opportuniti...
research
04/25/2023

Optimizing Deep Learning Models For Raspberry Pi

Deep learning models have become increasingly popular for a wide range o...
research
05/18/2022

Fast matrix multiplication for binary and ternary CNNs on ARM CPU

Low-bit quantized neural networks are of great interest in practical app...
research
06/18/2020

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

A growing number of applications implement predictive functions using de...
research
02/01/2021

Understanding Cache Boundness of ML Operators on ARM Processors

Machine Learning compilers like TVM allow a fast and flexible deployment...
research
07/22/2022

HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation

Quantization for CNN has shown significant progress with the intention o...
research
09/03/2022

Low-Power Hardware-Based Deep-Learning Diagnostics Support Case Study

Deep learning research has generated widespread interest leading to emer...

Please sign up or login with your details

Forgot password? Click here to reset