IKW: Inter-Kernel Weights for Power Efficient Edge Computing

08/25/2020
by   Pramod, et al.
0

Deep Convolutional Neural Networks (CNN) have achieved state-of-the-art recognition accuracy in a wide range of computer vision applications like image classification, object detection, semantic segmentation etc. Applications based on CNN require millions of multiply-accumulate (MAC) operations to be performed between input pixels and kernel weights during inference. This work investigates a technique, which can be used to eliminate redundant multiplications for a subset of kernel weights in a CNN layer by utilizing identical and/or similar inter-kernel weights (IKW) across kernels. In this work, IKW technique is used to identify identical and/or similar inter-kernel weights in trained, unpruned/pruned, quantized CNN kernels before inference phase. After identification of identical and/or similar inter-kernel weights, a subset of kernel weights termed non-pivot kernel weights are made zero, the other subset called pivot kernel weights are left unchanged. The multiplication corresponding to non-pivot kernel weights are eliminated, thus reducing computations. The products corresponding to non-pivot kernel weights are supplied by multiplication operation of pivot kernel weights, and hence causing no degradation in inference accuracy. Through experiments on state-of-the-art CNNs, we demonstrate that application of IKW technique enhances kernel sparsity by 9-37% for 8-bit precision kernel weight and 18-43% for 4-bit precision kernel weight without degrading the recognition accuracy of the CNN model. Enhanced kernel sparsity can be used to save power by clock gating the compute unit, or increase execution performance by skipping computations pertaining to zero valued non-pivot kernel weights. In addition, power savings are achieved by eliminating redundant power expensive fixed-point multiplication operations. The practical utility of the IKW technique is demonstrated by mapping it to well-known state-of-the-art CNN accelerator architectures. Mapping of the IKW technique on existing CNN accelerator architectures shows reduction in power by at least 12% for 8-bit precision and 19% for 4-bit precision kernel weight. Improvement in execution performance by at least 2% for 8-bit precision and 13% for 4-bit precision kernel weight is observed.

READ FULL TEXT

page 3

page 5

page 6

page 9

page 12

page 13

page 14

page 15

research
01/19/2022

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Convolutional Neural Networks (CNNs) demonstrate great performance in va...
research
06/23/2017

Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks

Loom (LM), a hardware inference accelerator for Convolutional Neural Net...
research
03/09/2018

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How

We show that, during inference with Convolutional Neural Networks (CNNs)...
research
03/25/2023

Compacting Binary Neural Networks by Sparse Kernel Selection

Binary Neural Network (BNN) represents convolution weights with 1-bit va...
research
04/18/2018

UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition

Convolutional Neural Networks (CNNs) have begun to permeate all corners ...
research
06/22/2019

Adaptive Precision CNN Accelerator Using Radix-X Parallel Connected Memristor Crossbars

Neural processor development is reducing our reliance on remote server a...
research
11/14/2018

Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators

Inference efficiency is the predominant consideration in designing deep ...

Please sign up or login with your details

Forgot password? Click here to reset