DeepAI AI Chat
Log In Sign Up

Overwrite Quantization: Opportunistic Outlier Handling for Neural Network Accelerators

10/13/2019
by   Ritchie Zhao, et al.
0

Outliers in weights and activations pose a key challenge for fixed-point quantization of neural networks. While outliers can be addressed by fine-tuning, this is not practical for machine learning (ML) service providers (e.g., Google, Microsoft) who often receive customers' models without the training data. Specialized hardware for handling outliers can enable low-precision DNNs, but incurs nontrivial area overhead. In this paper, we propose overwrite quantization (OverQ), a novel hardware technique which opportunistically increases bitwidth for outliers by letting them overwrite adjacent values. An FPGA prototype shows OverQ can significantly improve ResNet-18 accuracy at 4 bits while incurring relatively little increase in resource utilization.

READ FULL TEXT
01/28/2019

Improving Neural Network Quantization using Outlier Channel Splitting

Quantization can improve the execution latency and energy efficiency of ...
01/28/2019

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

Quantization can improve the execution latency and energy efficiency of ...
11/30/2022

Quadapter: Adapter for GPT-2 Quantization

Transformer language models such as GPT-2 are difficult to quantize beca...
10/02/2018

ACIQ: Analytical Clipping for Integer Quantization of neural networks

Unlike traditional approaches that focus on the quantization at the netw...
08/30/2022

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization

Quantization is a technique to reduce the computation and memory cost of...
04/22/2020

Up or Down? Adaptive Rounding for Post-Training Quantization

When quantizing neural networks, assigning each floating-point weight to...
06/06/2020

Generative Design of Hardware-aware DNNs

To efficiently run DNNs on the edge/cloud, many new DNN inference accele...