Development of Quantized DNN Library for Exact Hardware Emulation

06/15/2021
by   Masato Kiyama, et al.
0

Quantization is used to speed up execution time and save power when runnning Deep neural networks (DNNs) on edge devices like AI chips. To investigate the effect of quantization, we need performing inference after quantizing the weights of DNN with 32-bit floating-point precision by a some bit width, and then quantizing them back to 32-bit floating-point precision. This is because the DNN library can only handle floating-point numbers. However, the accuracy of the emulation does not provide accurate precision. We need accurate precision to detect overflow in MAC operations or to verify the operation on edge de vices. We have developed PyParch, a DNN library that executes quantized DNNs (QNNs) with exactly the same be havior as hardware. In this paper, we describe a new proposal and implementation of PyParch. As a result of the evaluation, the accuracy of QNNs with arbitrary bit widths can be estimated for la rge and complex DNNs such as YOLOv5, and the overflow can be detected. We evaluated the overhead of the emulation time and found that it was 5.6 times slower for QNN and 42 times slower for QNN with overflow detection compared to the normal DNN execution time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2018

Deep Positron: A Deep Neural Network Using the Posit Number System

The recent surge of interest in Deep Neural Networks (DNNs) has led to i...
research
10/14/2022

Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks

The biggest challenge for the deployment of Deep Neural Networks (DNNs) ...
research
09/15/2023

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

Extreme edge platforms, such as in-vehicle smart devices, require effici...
research
02/10/2020

A Framework for Semi-Automatic Precision and Accuracy Analysis for Fast and Rigorous Deep Learning

Deep Neural Networks (DNN) represent a performance-hungry application. F...
research
01/13/2021

FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference

Deep learning models typically use single-precision (FP32) floating poin...
research
11/04/2022

LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training

When training early-stage deep neural networks (DNNs), generating interm...
research
06/10/2022

Real-time Hyper-Dimensional Reconfiguration at the Edge using Hardware Accelerators

In this paper we present Hyper-Dimensional Reconfigurable Analytics at t...

Please sign up or login with your details

Forgot password? Click here to reset