Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

06/17/2022
by   Matteo Risso, et al.
18

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63 for the same accuracy.

READ FULL TEXT
research
06/01/2022

Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained Deep Neural Networks

Neural Architecture Search (NAS) is increasingly popular to automaticall...
research
04/14/2021

End-to-end Keyword Spotting using Neural Architecture Search and Quantization

This paper introduces neural architecture search (NAS) for the automatic...
research
10/08/2020

A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference

Low bit-width Quantized Neural Networks (QNNs) enable deployment of comp...
research
04/13/2020

Rethinking Differentiable Search for Mixed-Precision Neural Networks

Low-precision networks, with weights and activations quantized to low bi...
research
07/20/2020

Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization

Emergent hardwares can support mixed precision CNN models inference that...
research
03/31/2021

Bit-Mixer: Mixed-precision networks with runtime bit-width selection

Mixed-precision networks allow for a variable bit-width quantization for...
research
06/14/2021

Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision Quantization

Mixed-precision quantization is a powerful tool to enable memory and com...

Please sign up or login with your details

Forgot password? Click here to reset