Confounding Tradeoffs for Neural Network Quantization

02/12/2021
by   Sahaj Garg, et al.
5

Many neural network quantization techniques have been developed to decrease the computational and memory footprint of deep learning. However, these methods are evaluated subject to confounding tradeoffs that may affect inference acceleration or resource complexity in exchange for higher accuracy. In this work, we articulate a variety of tradeoffs whose impact is often overlooked and empirically analyze their impact on uniform and mixed-precision post-training quantization, finding that these confounding tradeoffs may have a larger impact on quantized network accuracy than the actual quantization methods themselves. Because these tradeoffs constrain the attainable hardware acceleration for different use-cases, we encourage researchers to explicitly report these design choices through the structure of "quantization cards." We expect quantization cards to help researchers compare methods more effectively and engineers determine the applicability of quantization techniques for their hardware.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2023

Ternary Quantization: A Survey

Inference time, model size, and accuracy are critical for deploying deep...
research
09/26/2019

Smart Ternary Quantization

Neural network models are resource hungry. Low bit quantization such as ...
research
03/04/2021

Neural Network-based Quantization for Network Automation

Deep Learning methods have been adopted in mobile networks, especially f...
research
12/08/2021

Neural Network Quantization for Efficient Inference: A Survey

As neural networks have become more powerful, there has been a rising de...
research
11/06/2022

A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers

Nanopore sequencing generates noisy electrical signals that need to be c...
research
11/05/2021

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Model quantization has emerged as an indispensable technique to accelera...
research
10/04/2021

Pre-Quantized Deep Learning Models Codified in ONNX to Enable Hardware/Software Co-Design

This paper presents a methodology to separate the quantization process f...

Please sign up or login with your details

Forgot password? Click here to reset