FBM: Fast-Bit Allocation for Mixed-Precision Quantization

05/30/2022
by   Moshe Kimhi, et al.
11

Quantized neural networks are well known for reducing latency, power consumption, and model size without significant degradation in accuracy, making them highly applicable for systems with limited resources and low power requirements. Mixed precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths. Existing mixed-precision schemes rely on having a high exploration space, resulting in a large carbon footprint. In addition, these bit allocation strategies mostly induce constraints on the model size rather than utilizing the performance of neural network deployment on specific hardware. Our work proposes Fast-Bit Allocation for Mixed-Precision Quantization (FBM), which finds an optimal bitwidth allocation by measuring desired behaviors through a simulation of a specific device, or even on a physical one. While dynamic transitions of bit allocation in mixed precision quantization with ultra-low bitwidth are known to suffer from performance degradation, we present a fast recovery solution from such transitions. A comprehensive evaluation of the proposed method on CIFAR-10 and ImageNet demonstrates our method's superiority over current state-of-the-art schemes in terms of the trade-off between neural network accuracy and hardware efficiency. Our source code, experimental settings and quantized models are available at https://github.com/RamorayDrake/FBM/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2023

A Practical Mixed Precision Algorithm for Post-Training Quantization

Neural network quantization is frequently used to optimize model size, l...
research
02/06/2022

Energy awareness in low precision neural networks

Power consumption is a major obstacle in the deployment of deep neural n...
research
07/06/2023

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Mixed-precision quantization, where a deep neural network's layers are q...
research
10/09/2019

QPyTorch: A Low-Precision Arithmetic Simulation Framework

Low-precision training reduces computational cost and produces efficient...
research
02/10/2021

Impact of Bit Allocation Strategies on Machine Learning Performance in Rate Limited Systems

Intelligent entities such as self-driving vehicles, with their data bein...
research
02/02/2021

Benchmarking Quantized Neural Networks on FPGAs with FINN

The ever-growing cost of both training and inference for state-of-the-ar...
research
02/18/2021

GradFreeBits: Gradient Free Bit Allocation for Dynamic Low Precision Neural Networks

Quantized neural networks (QNNs) are among the main approaches for deplo...

Please sign up or login with your details

Forgot password? Click here to reset