DeepAI AI Chat
Log In Sign Up

Pre-Quantized Deep Learning Models Codified in ONNX to Enable Hardware/Software Co-Design

by   Ulf Hanebutte, et al.

This paper presents a methodology to separate the quantization process from the hardware-specific model compilation stage via a pre-quantized deep learning model description in standard ONNX format. Separating the quantization process from the model compilation stage enables independent development. The methodology is expressive to convey hardware-specific operations and to embed key quantization parameters into a ONNX model which enables hardware/software co-design. Detailed examples are given for both MLP and CNN based networks, which can be extended to other networks in a straightforward fashion.


page 1

page 2

page 3

page 4


Defensive Quantization: When Efficiency Meets Robustness

Neural network quantization is becoming an industry standard to efficien...

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

A growing number of applications implement predictive functions using de...

Hardware-friendly Deep Learning by Network Quantization and Binarization

Quantization is emerging as an efficient approach to promote hardware-fr...

GHN-Q: Parameter Prediction for Unseen Quantized Convolutional Architectures via Graph Hypernetworks

Deep convolutional neural network (CNN) training via iterative optimizat...

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

The nonuniform quantization strategy for compressing neural networks usu...

Confounding Tradeoffs for Neural Network Quantization

Many neural network quantization techniques have been developed to decre...

PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM) Systems

Processing-in-memory (PIM), an increasingly studied neuromorphic hardwar...