Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference

In this paper, we present Quark, an integer RISC-V vector processor specifically tailored for sub-byte DNN inference. Quark is implemented in GlobalFoundries' 22FDX FD-SOI technology. It is designed on top of Ara, an open-source 64-bit RISC-V vector processor. To accommodate sub-byte DNN inference, Quark extends Ara by adding specialized vector instructions to perform sub-byte quantized operations. We also remove the floating-point unit from Quarks' lanes and use the CVA6 RISC-V scalar core for the re-scaling operations that are required in quantized neural network inference. This makes each lane of Quark 2 times smaller and 1.9 times more power efficient compared to the ones of Ara. In this paper we show that Quark can run quantized models at sub-byte precision. Notably we show that for 1-bit and 2-bit quantized models, Quark can accelerate computation of Conv2d over various ranges of inputs and kernel sizes.

READ FULL TEXT
research
06/16/2023

Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference

Convolutional Neural Networks (CNNs) are used in a wide range of applica...
research
09/14/2020

Fast Implementation of 4-bit Convolutional Neural Networks for Mobile Devices

Quantized low-precision neural networks are very popular because they re...
research
09/12/2017

Streamlined Deployment for Quantized Neural Networks

Running Deep Neural Network (DNN) models on devices with limited computa...
research
09/15/2023

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

Extreme edge platforms, such as in-vehicle smart devices, require effici...
research
09/27/2018

Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks

Quantization of weights and activations in Deep Neural Networks (DNNs) i...
research
10/08/2020

A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference

Low bit-width Quantized Neural Networks (QNNs) enable deployment of comp...
research
10/01/2019

NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques

Quantization has emerged to be an effective way to significantly boost t...

Please sign up or login with your details

Forgot password? Click here to reset