Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search

11/30/2018
by   Bichen Wu, et al.
0

Recent work in network quantization has substantially reduced the time and space complexity of neural network inference, enabling their deployment on embedded and mobile devices with limited computational and memory resources. However, existing quantization methods often represent all weights and activations with the same precision (bit-width). In this paper, we explore a new dimension of the design space: quantizing different layers with different bit-widths. We formulate this problem as a neural architecture search problem and propose a novel differentiable neural architecture search (DNAS) framework to efficiently explore its exponential search space with gradient-based optimization. Experiments show we surpass the state-of-the-art compression of ResNet on CIFAR-10 and ImageNet. Our quantized models with 21.1x smaller model size or 103.9x lower computational cost can still outperform baseline quantized or even full precision models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2021

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Mixed-precision quantization can potentially achieve the optimal tradeof...
research
05/19/2021

BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

As the applications of deep learning models on edge devices increase at ...
research
10/09/2020

Once Quantized for All: Progressively Searching for Quantized Efficient Models

Automatic search of Quantized Neural Networks has attracted a lot of att...
research
04/13/2020

Rethinking Differentiable Search for Mixed-Precision Neural Networks

Low-precision networks, with weights and activations quantized to low bi...
research
07/15/2020

Finding Non-Uniform Quantization Schemes usingMulti-Task Gaussian Processes

We propose a novel method for neural network quantization that casts the...
research
04/07/2023

AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

Exploring the expected quantizing scheme with suitable mixed-precision p...
research
02/25/2020

Searching for Winograd-aware Quantized Networks

Lightweight architectural designs of Convolutional Neural Networks (CNNs...

Please sign up or login with your details

Forgot password? Click here to reset