Are We There Yet? Product Quantization and its Hardware Acceleration

05/25/2023
by   Javier Fernandez-Marques, et al.
0

Conventional multiply-accumulate (MAC) operations have long dominated computation time for deep neural networks (DNNs). Recently, product quantization (PQ) has been successfully applied to these workloads, replacing MACs with memory lookups to pre-computed dot products. While this property makes PQ an attractive solution for model acceleration, little is understood about the associated trade-offs in terms of compute and memory footprint, and the impact on accuracy. Our empirical study investigates the impact of different PQ settings and training methods on layerwise reconstruction error and end-to-end model accuracy. When studying the efficiency of deploying PQ DNNs, we find that metrics such as FLOPs, number of parameters, and even CPU/GPU performance, can be misleading. To address this issue, and to more fairly assess PQ in terms of hardware efficiency, we design the first custom hardware accelerator to evaluate the speed and efficiency of running PQ models. We identify PQ configurations that are able to improve performance-per-area for ResNet20 by 40 accelerator. Our hardware performance outperforms recent PQ solutions by 4x, with only a 0.6 hardware-aware design of PQ models, paving the way for wider adoption of this emerging DNN approximation methodology.

READ FULL TEXT

page 18

page 24

page 26

page 27

page 28

research
03/14/2023

DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators

While the role of Deep Neural Networks (DNNs) in a wide range of safety-...
research
12/02/2017

LightNN: Filling the Gap between Conventional Deep Neural Networks and Binarized Networks

Application-specific integrated circuit (ASIC) implementations for Deep ...
research
12/10/2019

SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads

In recent years, there has been tremendous advances in hardware accelera...
research
12/01/2022

FADEC: FPGA-based Acceleration of Video Depth Estimation by HW/SW Co-design

3D reconstruction from videos has become increasingly popular for variou...
research
10/07/2022

A Closer Look at Hardware-Friendly Weight Quantization

Quantizing a Deep Neural Network (DNN) model to be used on a custom acce...
research
11/14/2022

Edge2Vec: A High Quality Embedding for the Jigsaw Puzzle Problem

Pairwise compatibility measure (CM) is a key component in solving the ji...
research
09/03/2021

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

In-Memory Acceleration (IMA) promises major efficiency improvements in d...

Please sign up or login with your details

Forgot password? Click here to reset