FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons

12/17/2020
by   Simon Wiedemann, et al.
0

With the growing demand for deploying deep learning models to the "edge", it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. Our approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named FantastIC4, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to the other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by 51× in terms of throughput and 145× in terms of area efficiency (GOPS/W).

READ FULL TEXT

page 1

page 6

page 7

research
08/30/2019

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

In the wake of the success of convolutional neural networks in image cla...
research
11/04/2016

Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks

Recently deep neural networks have received considerable attention due t...
research
11/24/2019

A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation

The computing wall and data movement challenges of deep neural networks ...
research
11/05/2017

Beyond Profiling: Scaling Profiling Data Usage to Multiple Applications

Profiling techniques are used extensively at different parts of the comp...
research
12/06/2021

Kraken: An Efficient Engine with a Uniform Dataflow for Deep Neural Networks

Deep neural networks (DNNs) have been successfully employed in a multitu...
research
07/18/2020

DeepDive: An Integrative Algorithm/Architecture Co-Design for Deep Separable Convolutional Neural Networks

Deep Separable Convolutional Neural Networks (DSCNNs) have become the em...

Please sign up or login with your details

Forgot password? Click here to reset