Pre-Quantized Deep Learning Models Codified in ONNX to Enable Hardware/Software Co-Design

10/04/2021
by   Ulf Hanebutte, et al.
0

This paper presents a methodology to separate the quantization process from the hardware-specific model compilation stage via a pre-quantized deep learning model description in standard ONNX format. Separating the quantization process from the model compilation stage enables independent development. The methodology is expressive to convey hardware-specific operations and to embed key quantization parameters into a ONNX model which enables hardware/software co-design. Detailed examples are given for both MLP and CNN based networks, which can be extended to other networks in a straightforward fashion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2023

MRQ:Support Multiple Quantization Schemes through Model Re-Quantization

Despite the proliferation of diverse hardware accelerators (e.g., NPU, T...
research
04/17/2019

Defensive Quantization: When Efficiency Meets Robustness

Neural network quantization is becoming an industry standard to efficien...
research
06/18/2020

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

A growing number of applications implement predictive functions using de...
research
12/01/2021

Hardware-friendly Deep Learning by Network Quantization and Binarization

Quantization is emerging as an efficient approach to promote hardware-fr...
research
08/26/2022

GHN-Q: Parameter Prediction for Unseen Quantized Convolutional Architectures via Graph Hypernetworks

Deep convolutional neural network (CNN) training via iterative optimizat...
research
02/12/2021

Confounding Tradeoffs for Neural Network Quantization

Many neural network quantization techniques have been developed to decre...
research
09/21/2023

Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam

Although Large Language Models (LLMs) represent a revolution in the way ...

Please sign up or login with your details

Forgot password? Click here to reset