Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers

05/30/2019
by   Manuele Rusci, et al.
0

This paper presents a novel end-to-end methodology for enabling the deployment of low-error deep networks on microcontrollers. To fit the memory and computational limitations of resource-constrained edge-devices, we exploit mixed low-bitwidth compression, featuring 8, 4 or 2-bit uniform quantization, and we model the inference graph with integer-only operations. Our approach aims at determining the minimum bit precision of every activation and weight tensor given the memory constraints of a device. This is achieved through a rule-based iterative procedure, which cuts the number of bits of the most memory-demanding layers, aiming at meeting the memory constraints. After a quantization-aware retraining step, the fake-quantized graph is converted into an inference integer-only model by inserting the Integer Channel-Normalization (ICN) layers, which introduce a negligible loss as demonstrated on INT4 MobilenetV1 models. We report the latency-accuracy evaluation of mixed-precision MobilenetV1 family networks on a STM32H7 microcontroller. Our experimental results demonstrate an end-to-end deployment of an integer-only Mobilenet network with Top1 accuracy of 68 memory and 512kB of RAM, improving by 8 previously published 8 bit implementations for microcontrollers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2020

Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers

The severe on-chip memory limitations are currently preventing the deplo...
research
11/20/2020

HAWQV3: Dyadic Neural Network Quantization

Quantization is one of the key techniques used to make Neural Networks (...
research
02/09/2022

Lightweight Jet Reconstruction and Identification as an Object Detection Task

We apply object detection techniques based on deep convolutional blocks ...
research
07/04/2020

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Model quantization helps to reduce model size and latency of deep neural...
research
07/16/2022

Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation

Recently, deep convolutional neural networks (CNNs) have achieved many e...
research
10/08/2020

A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference

Low bit-width Quantized Neural Networks (QNNs) enable deployment of comp...
research
07/18/2021

A High-Performance Adaptive Quantization Approach for Edge CNN Applications

Recent convolutional neural network (CNN) development continues to advan...

Please sign up or login with your details

Forgot password? Click here to reset