Overwrite Quantization: Opportunistic Outlier Handling for Neural Network Accelerators

10/13/2019
by   Ritchie Zhao, et al.
0

Outliers in weights and activations pose a key challenge for fixed-point quantization of neural networks. While outliers can be addressed by fine-tuning, this is not practical for machine learning (ML) service providers (e.g., Google, Microsoft) who often receive customers' models without the training data. Specialized hardware for handling outliers can enable low-precision DNNs, but incurs nontrivial area overhead. In this paper, we propose overwrite quantization (OverQ), a novel hardware technique which opportunistically increases bitwidth for outliers by letting them overwrite adjacent values. An FPGA prototype shows OverQ can significantly improve ResNet-18 accuracy at 4 bits while incurring relatively little increase in resource utilization.

READ FULL TEXT
research
01/28/2019

Improving Neural Network Quantization using Outlier Channel Splitting

Quantization can improve the execution latency and energy efficiency of ...
research
01/28/2019

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

Quantization can improve the execution latency and energy efficiency of ...
research
11/30/2022

Quadapter: Adapter for GPT-2 Quantization

Transformer language models such as GPT-2 are difficult to quantize beca...
research
10/02/2018

ACIQ: Analytical Clipping for Integer Quantization of neural networks

Unlike traditional approaches that focus on the quantization at the netw...
research
04/15/2023

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization

Transformer-based large language models (LLMs) have achieved great succe...
research
09/27/2022

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

Transformer architecture has become the fundamental element of the wides...
research
04/17/2020

Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks

Deep neural networks (DNNs) are known for their inability to utilize und...

Please sign up or login with your details

Forgot password? Click here to reset