A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference

10/08/2020
by   Gianmarco Ottavi, et al.
0

Low bit-width Quantized Neural Networks (QNNs) enable deployment of complex machine learning models on constrained devices such as microcontrollers (MCUs) by reducing their memory footprint. Fine-grained asymmetric quantization (i.e., different bit-widths assigned to weights and activations on a tensor-by-tensor basis) is a particularly interesting scheme to maximize accuracy under a tight memory constraint. However, the lack of sub-byte instruction set architecture (ISA) support in SoA microprocessors makes it hard to fully exploit this extreme quantization paradigm in embedded MCUs. Support for sub-byte and asymmetric QNNs would require many precision formats and an exorbitant amount of opcode space. In this work, we attack this problem with status-based SIMD instructions: rather than encoding precision explicitly, each operand's precision is set dynamically in a core status register. We propose a novel RISC-V ISA core MPIC (Mixed Precision Inference Core) based on the open-source RI5CY core. Our approach enables full support for mixed-precision QNN inference with different combinations of operands at 16-, 8-, 4- and 2-bit precision, without adding any extra opcode or increasing the complexity of the decode stage. Our results show that MPIC improves both performance and energy efficiency by a factor of 1.1-4.9x when compared to software-based mixed-precision on RI5CY; with respect to commercially available Cortex-M4 and M7 microcontrollers, it delivers 3.6-11.7x better performance and 41-155x higher efficiency.

READ FULL TEXT

page 1

page 3

page 4

research
07/03/2023

A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks

The emerging trend of deploying complex algorithms, such as Deep Neural ...
research
06/17/2022

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Quantization is widely employed in both cloud and edge systems to reduce...
research
06/16/2023

Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference

Convolutional Neural Networks (CNNs) are used in a wide range of applica...
research
07/15/2020

Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices

The deployment of Quantized Neural Networks (QNN) on advanced microcontr...
research
09/29/2022

Tuning of Mixture-of-Experts Mixed-Precision Neural Networks

Deep learning has become a useful data analysis method, however mainstre...
research
05/30/2019

Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers

This paper presents a novel end-to-end methodology for enabling the depl...
research
02/12/2023

Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference

In this paper, we present Quark, an integer RISC-V vector processor spec...

Please sign up or login with your details

Forgot password? Click here to reset