Technical Report: NEMO DNN Quantization for Deployment Model

04/13/2020
by   Francesco Conti, et al.
7

This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment. It also acts as a documentation for the NEMO (NEural Minimization for pytOrch) framework. It describes the four DNN representations used in NEMO (FullPrecision, FakeQuantized, QuantizedDeployable and IntegerDeployable), focusing in particular on a formal definition of the latter two. An important feature of this model, and in particular the IntegerDeployable representation, is that it enables DNN inference using purely integers - without resorting to real-valued numbers in any part of the computation and without relying on an explicit fixed-point numerical representation.

READ FULL TEXT

page 3

page 4

page 7

page 10

page 11

research
10/30/2021

ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA

This work targets the commonly used FPGA (field-programmable gate array)...
research
05/27/2019

Learning In Practice: Reasoning About Quantization

There is a mismatch between the standard theoretical analyses of statist...
research
10/13/2017

TensorQuant - A Simulation Toolbox for Deep Neural Network Quantization

Recent research implies that training and inference of deep neural netwo...
research
10/30/2021

RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

This work proposes a novel Deep Neural Network (DNN) quantization framew...
research
07/28/2021

Robust and Active Learning for Deep Neural Network Regression

We describe a gradient-based method to discover local error maximizers o...
research
04/26/2023

Technical Note: Defining and Quantifying AND-OR Interactions for Faithful and Concise Explanation of DNNs

In this technical note, we aim to explain a deep neural network (DNN) by...
research
08/15/2018

DNN Feature Map Compression using Learned Representation over GF(2)

In this paper, we introduce a method to compress intermediate feature ma...

Please sign up or login with your details

Forgot password? Click here to reset