Instant Quantization of Neural Networks using Monte Carlo Methods

05/29/2019
by   Gonçalo Mordido, et al.
0

Low bit-width integer weights and activations are very important for efficient inference, especially with respect to lower power consumption. We propose Monte Carlo methods to quantize the weights and activations of pre-trained neural networks without any re-training. By performing importance sampling we obtain quantized low bit-width integer values from full-precision weights and activations. The precision, sparsity, and complexity are easily configurable by the amount of sampling performed. Our approach, called Monte Carlo Quantization (MCQ), is linear in both time and space, with the resulting quantized, sparse networks showing minimal accuracy loss when compared to the original full-precision networks. Our method either outperforms or achieves competitive results on multiple benchmarks compared to previous quantization methods that do require additional training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2023

Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance

We introduce a quantization-aware training algorithm that guarantees avo...
research
04/20/2018

Value-aware Quantization for Training and Inference of Neural Networks

We propose a novel value-aware quantization which applies aggressively r...
research
11/30/2016

Effective Quantization Methods for Recurrent Neural Networks

Reducing bit-widths of weights, activations, and gradients of a Neural N...
research
02/02/2021

Benchmarking Quantized Neural Networks on FPGAs with FINN

The ever-growing cost of both training and inference for state-of-the-ar...
research
12/20/2019

AdaBits: Neural Network Quantization with Adaptive Bit-Widths

Deep neural networks with adaptive configurations have gained increasing...
research
05/25/2022

A Low Memory Footprint Quantized Neural Network for Depth Completion of Very Sparse Time-of-Flight Depth Maps

Sparse active illumination enables precise time-of-flight depth sensing ...
research
01/14/2021

On the quantization of recurrent neural networks

Integer quantization of neural networks can be defined as the approximat...

Please sign up or login with your details

Forgot password? Click here to reset