Log In Sign Up

Development of Quantized DNN Library for Exact Hardware Emulation

by   Masato Kiyama, et al.

Quantization is used to speed up execution time and save power when runnning Deep neural networks (DNNs) on edge devices like AI chips. To investigate the effect of quantization, we need performing inference after quantizing the weights of DNN with 32-bit floating-point precision by a some bit width, and then quantizing them back to 32-bit floating-point precision. This is because the DNN library can only handle floating-point numbers. However, the accuracy of the emulation does not provide accurate precision. We need accurate precision to detect overflow in MAC operations or to verify the operation on edge de vices. We have developed PyParch, a DNN library that executes quantized DNNs (QNNs) with exactly the same be havior as hardware. In this paper, we describe a new proposal and implementation of PyParch. As a result of the evaluation, the accuracy of QNNs with arbitrary bit widths can be estimated for la rge and complex DNNs such as YOLOv5, and the overflow can be detected. We evaluated the overhead of the emulation time and found that it was 5.6 times slower for QNN and 42 times slower for QNN with overflow detection compared to the normal DNN execution time.


page 1

page 2

page 3

page 4


Deep Positron: A Deep Neural Network Using the Posit Number System

The recent surge of interest in Deep Neural Networks (DNNs) has led to i...

Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks

The biggest challenge for the deployment of Deep Neural Networks (DNNs) ...

A Framework for Semi-Automatic Precision and Accuracy Analysis for Fast and Rigorous Deep Learning

Deep Neural Networks (DNN) represent a performance-hungry application. F...

FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference

Deep learning models typically use single-precision (FP32) floating poin...

LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training

When training early-stage deep neural networks (DNNs), generating interm...

Real-time Hyper-Dimensional Reconfiguration at the Edge using Hardware Accelerators

In this paper we present Hyper-Dimensional Reconfigurable Analytics at t...

Adaptive Block Floating-Point for Analog Deep Learning Hardware

Analog mixed-signal (AMS) devices promise faster, more energy-efficient ...