XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Network on RISC-V based IoT End Nodes

11/29/2020
by   Angelo Garofalo, et al.
0

This work introduces lightweight extensions to the RISC-V ISA to boost the efficiency of heavily Quantized Neural Network (QNN) inference on microcontroller-class cores. By extending the ISA with nibble (4-bit) and crumb (2-bit) SIMD instructions, we are able to show near-linear speedup with respect to higher precision integer computation on the key kernels for QNN computation. Also, we propose a custom execution paradigm for SIMD sum-of-dot-product operations, which consists of fusing a dot product with a load operation, with an up to 1.64x peak MAC/cycle improvement compared to a standard execution scenario. To further push the efficiency, we integrate the RISC-V extended core in a parallel cluster of 8 processors, with near-linear improvement with respect to a single core architecture. To evaluate the proposed extensions, we fully implement the cluster of processors in GF22FDX technology. QNN convolution kernels on a parallel cluster implementing the proposed extension run 6 x and 8 x faster when considering 4- and 2-bit data operands, respectively, compared to a baseline processing cluster only supporting 8-bit SIMD instructions. With a peak of 2.22 TOPs/s/W, the proposed solution achieves efficiency levels comparable with dedicated DNN inference accelerators, and up to three orders of magnitude better than state-of-the-art ARM Cortex-M based microcontroller systems such as the low-end STM32L4 MCU and the high-end STM32H7 MCU.

READ FULL TEXT

page 1

page 5

page 11

page 13

page 16

research
08/29/2019

PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors

We present PULP-NN, an optimized computing library for a parallel ultra-...
research
07/15/2020

Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices

The deployment of Quantized Neural Networks (QNN) on advanced microcontr...
research
04/12/2017

Parallelized Kendall's Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

Pairwise association measure is an important operation in data analytics...
research
07/03/2023

A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks

The emerging trend of deploying complex algorithms, such as Deep Neural ...
research
01/21/2022

Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Computationally intensive algorithms such as Deep Neural Networks (DNNs)...
research
01/07/2019

Efficient Winograd Convolution via Integer Arithmetic

Convolution is the core operation for many deep neural networks. The Win...

Please sign up or login with your details

Forgot password? Click here to reset