Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

02/18/2018
by   Yanzhi Wang, et al.
0

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). An algorithm-hardware co-optimization framework is developed, which is applicable to different DNN types, sizes, and application scenarios. The algorithm part adopts the general block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. It applies to both fully-connected and convolutional layers and contains a mathematically rigorous proof of the effectiveness of the method. The proposed algorithm reduces computational complexity per layer from O(n^2) to O(n n) and storage complexity from O(n^2) to O(n), both for training and inference. The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical control. Experimental results demonstrate that the proposed framework achieves at least 152X speedup and 71X energy efficiency gain compared with IBM TrueNorth processor under the same test accuracy. It achieves at least 31X energy efficiency gain compared with the reference FPGA-based work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2018

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

Both industry and academia have extensively investigated hardware accele...
research
04/05/2019

FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference

To improve the throughput and energy efficiency of Deep Neural Networks ...
research
04/18/2017

A Study of Deep Learning Robustness Against Computation Failures

For many types of integrated circuits, accepting larger failure rates in...
research
07/22/2019

A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology

The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology ...
research
05/10/2018

Towards Budget-Driven Hardware Optimization for Deep Convolutional Neural Networks using Stochastic Computing

Recently, Deep Convolutional Neural Network (DCNN) has achieved tremendo...
research
04/23/2020

PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices

Deep neural network (DNN) has emerged as the most important and popular ...
research
03/26/2023

A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics

This paper proposes a special-purpose system to achieve high-accuracy an...

Please sign up or login with your details

Forgot password? Click here to reset