ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

12/31/2018
by   Ao Ren, et al.
10

To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), two important categories of DNN model compression techniques: weight pruning and weight quantization are investigated. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. However, there lacks a systematic framework of joint weight pruning and quantization of DNNs, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted for besides simply model size reduction. To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to deal with non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than prior work. The second part is hardware-aware DNN optimizations to facilitate hardware-level implementations. Without accuracy loss, we can achieve 85× and 24× pruning on LeNet-5 and AlexNet models, respectively, significantly higher than prior work. The improvement becomes more significant when focusing on computation reductions. Combining weight pruning and quantization, we achieve 1,910× and 231× reductions in overall model size on these two benchmarks, when focusing on data storage. Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2018

A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM

Many model compression techniques of Deep Neural Networks (DNNs) have be...
research
07/03/2019

Non-structured DNN Weight Pruning Considered Harmful

Large deep neural network (DNN) models pose the key challenge to energy ...
research
09/29/2019

REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs

Deep neural networks (DNNs), as the basis of object detection, will play...
research
04/10/2018

A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers

Weight pruning methods for deep neural networks (DNNs) have been investi...
research
03/23/2019

Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM

Weight pruning and weight quantization are two important categories of D...
research
05/02/2019

Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM

Weight quantization is one of the most important techniques of Deep Neur...
research
10/17/2018

Progressive Weight Pruning of Deep Neural Networks using ADMM

Deep neural networks (DNNs) although achieving human-level performance i...

Please sign up or login with your details

Forgot password? Click here to reset