Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference

01/30/2023
by   Deepika Bablani, et al.
0

For effective and efficient deep neural network inference, it is desirable to achieve state-of-the-art accuracy with the simplest networks requiring the least computation, memory, and power. Quantizing networks to lower precision is a powerful technique for simplifying networks. It is generally desirable to quantize as aggressively as possible without incurring significant accuracy degradation. As each layer of a network may have different sensitivity to quantization, mixed precision quantization methods selectively tune the precision of individual layers of a network to achieve a minimum drop in task performance (e.g., accuracy). To estimate the impact of layer precision choice on task performance two methods are introduced: i) Entropy Approximation Guided Layer selection (EAGL) is fast and uses the entropy of the weight distribution, and ii) Accuracy-aware Layer Precision Selection (ALPS) is straightforward and relies on single epoch fine-tuning after layer precision reduction. Using EAGL and ALPS for layer precision selection, full-precision accuracy is recovered with a mix of 4-bit and 2-bit layers for ResNet-50 and ResNet-101 classification networks, demonstrating improved performance across the entire accuracy-throughput frontier, and equivalent performance for the PSPNet segmentation network in our own commensurate comparison over leading mixed precision layer selection techniques, while requiring orders of magnitude less compute time to reach a solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2017

Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

We propose a cluster-based quantization method to convert pre-trained fu...
research
07/11/2023

Mixed-Precision Quantization with Cross-Layer Dependencies

Quantization is commonly used to compress and accelerate deep neural net...
research
11/04/2019

Ternary MobileNets via Per-Layer Hybrid Filter Banks

MobileNets family of computer vision neural networks have fueled tremend...
research
08/18/2020

Compute, Time and Energy Characterization of Encoder-Decoder Networks with Automatic Mixed Precision Training

Deep neural networks have shown great success in many diverse fields. Th...
research
08/11/2022

Mixed-Precision Neural Networks: A Survey

Mixed-precision Deep Neural Networks achieve the energy efficiency and t...
research
07/20/2022

Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss

Based on the model's resilience to computational noise, model quantizati...
research
06/04/2019

PCA-driven Hybrid network design for enabling Intelligence at the Edge

The recent advent of IOT has increased the demand for enabling AI-based ...

Please sign up or login with your details

Forgot password? Click here to reset