Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

by   Yanzhi Wang, et al.

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). An algorithm-hardware co-optimization framework is developed, which is applicable to different DNN types, sizes, and application scenarios. The algorithm part adopts the general block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. It applies to both fully-connected and convolutional layers and contains a mathematically rigorous proof of the effectiveness of the method. The proposed algorithm reduces computational complexity per layer from O(n^2) to O(n n) and storage complexity from O(n^2) to O(n), both for training and inference. The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical control. Experimental results demonstrate that the proposed framework achieves at least 152X speedup and 71X energy efficiency gain compared with IBM TrueNorth processor under the same test accuracy. It achieves at least 31X energy efficiency gain compared with the reference FPGA-based work.


page 1

page 2

page 3

page 4


Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

Both industry and academia have extensively investigated hardware accele...

FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference

To improve the throughput and energy efficiency of Deep Neural Networks ...

A Study of Deep Learning Robustness Against Computation Failures

For many types of integrated circuits, accepting larger failure rates in...

A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology

The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology ...

Towards Budget-Driven Hardware Optimization for Deep Convolutional Neural Networks using Stochastic Computing

Recently, Deep Convolutional Neural Network (DCNN) has achieved tremendo...

PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices

Deep neural network (DNN) has emerged as the most important and popular ...

A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics

This paper proposes a special-purpose system to achieve high-accuracy an...

Please sign up or login with your details

Forgot password? Click here to reset