A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks

07/03/2023
by   Alessandro Nadalini, et al.
0

The emerging trend of deploying complex algorithms, such as Deep Neural Networks (DNNs), increasingly poses strict memory and energy efficiency requirements on Internet-of-Things (IoT) end-nodes. Mixed-precision quantization has been proposed as a technique to minimize a DNN's memory footprint and maximize its execution efficiency, with negligible end-to-end precision degradation. In this work, we present a novel hardware and software stack for energy-efficient inference of mixed-precision Quantized Neural Networks (QNNs). We introduce Flex-V, a processor based on the RISC-V Instruction Set Architecture (ISA) that features fused Mac Load mixed-precision dot product instructions; to avoid the exponential growth of the encoding space due to mixed-precision variants, we encode formats into the Control-Status Registers (CSRs). Flex-V core is integrated into a tightly-coupled cluster of eight processors; in addition, we provide a full framework for the end-to-end deployment of DNNs including a compiler, optimized libraries, and a memory-aware deployment flow. Our results show up to 91.5 MAC/cycle and 3.26 TOPS/W on the cluster, implemented in a commercial 22nm FDX technology, with up to 8.5x speed-up, and an area overhead of only 5.6 baseline. To demonstrate the capabilities of the architecture, we benchmark it with end-to-end real-life QNNs, improving performance by 2x - 2.5x with respect to existing solutions using fully flexible programmable processors.

READ FULL TEXT

page 1

page 2

page 3

research
10/08/2020

A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference

Low bit-width Quantized Neural Networks (QNNs) enable deployment of comp...
research
07/15/2020

Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices

The deployment of Quantized Neural Networks (QNN) on advanced microcontr...
research
01/21/2022

Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Computationally intensive algorithms such as Deep Neural Networks (DNNs)...
research
08/17/2020

DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs

The deployment of Deep Neural Networks (DNNs) on end-nodes at the extrem...
research
11/29/2020

XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Network on RISC-V based IoT End Nodes

This work introduces lightweight extensions to the RISC-V ISA to boost t...
research
04/07/2023

AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

Exploring the expected quantizing scheme with suitable mixed-precision p...
research
11/21/2022

BrainTTA: A 35 fJ/op Compiler Programmable Mixed-Precision Transport-Triggered NN SoC

Recently, accelerators for extremely quantized deep neural network (DNN)...

Please sign up or login with your details

Forgot password? Click here to reset