DeepAI AI Chat
Log In Sign Up

Pre-Quantized Deep Learning Models Codified in ONNX to Enable Hardware/Software Co-Design

10/04/2021
by   Ulf Hanebutte, et al.
0

This paper presents a methodology to separate the quantization process from the hardware-specific model compilation stage via a pre-quantized deep learning model description in standard ONNX format. Separating the quantization process from the model compilation stage enables independent development. The methodology is expressive to convey hardware-specific operations and to embed key quantization parameters into a ONNX model which enables hardware/software co-design. Detailed examples are given for both MLP and CNN based networks, which can be extended to other networks in a straightforward fashion.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/17/2019

Defensive Quantization: When Efficiency Meets Robustness

Neural network quantization is becoming an industry standard to efficien...
06/18/2020

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

A growing number of applications implement predictive functions using de...
12/01/2021

Hardware-friendly Deep Learning by Network Quantization and Binarization

Quantization is emerging as an efficient approach to promote hardware-fr...
08/26/2022

GHN-Q: Parameter Prediction for Unseen Quantized Convolutional Architectures via Graph Hypernetworks

Deep convolutional neural network (CNN) training via iterative optimizat...
11/29/2021

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

The nonuniform quantization strategy for compressing neural networks usu...
02/12/2021

Confounding Tradeoffs for Neural Network Quantization

Many neural network quantization techniques have been developed to decre...
09/18/2022

PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM) Systems

Processing-in-memory (PIM), an increasingly studied neuromorphic hardwar...