Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers

08/12/2020
by   Manuele Rusci, et al.
0

The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagenet classification. Concerning the quantization policy search, the RL agent selects quantization policies that maximize the memory utilization. Given an MCU-class memory bound of 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions quantized with a non-uniform function, which is not tailored for CPUs featuring integer-only arithmetic. This denotes the viability of uniform quantization, required for MCU deployments, for deep weights compression. When also limiting the activation memory budget to 512kB, the best MobileNetV1 model scores up to 68.4 more accurate than the other 8-bit networks fitting the same memory constraints.

READ FULL TEXT
research
05/30/2019

Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers

This paper presents a novel end-to-end methodology for enabling the depl...
research
04/24/2020

Automatic low-bit hybrid quantization of neural networks through meta learning

Model quantization is a widely used technique to compress and accelerate...
research
05/16/2023

MINT: Multiplier-less Integer Quantization for Spiking Neural Networks

We propose Multiplier-less INTeger (MINT) quantization, an efficient uni...
research
07/24/2023

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

Recent advancement in Automatic Speech Recognition (ASR) has produced la...
research
10/02/2019

Quantized Reinforcement Learning (QUARL)

Recent work has shown that quantization can help reduce the memory, comp...
research
07/20/2020

HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs

Recent work in network quantization produced state-of-the-art results us...
research
10/06/2021

8-bit Optimizers via Block-wise Quantization

Stateful optimizers maintain gradient statistics over time, e.g., the ex...

Please sign up or login with your details

Forgot password? Click here to reset