A MAC-less Neural Inference Processor Supporting Compressed, Variable Precision Weights

12/10/2020
by   Vincenzo Liguori, et al.
0

This paper introduces two architectures for the inference of convolutional neural networks (CNNs). Both architectures exploit weight sparsity and compression to reduce computational complexity and bandwidth. The first architecture uses multiply-accumulators (MACs) but avoids unnecessary multiplications by skipping zero weights. The second architecture exploits weight sparsity at the level of their bit representation by substituting resource-intensive MACs with much smaller Bit Layer Multiply Accumulators (BLMACs). The use of BLMACs also allows variable precision weights as variable size integers and even floating points. Some details of an implementation of the second architecture are given. Weight compression with arithmetic coding is also discussed as well as bandwidth implications. Finally, some implementation results for a pathfinder design and various technologies are presented.

READ FULL TEXT

page 1

page 2

research
01/25/2022

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Applications of neural networks on edge systems have proliferated in rec...
research
07/11/2020

HOBFLOPS CNNs: Hardware Optimized Bitsliced Floating-Point Operations Convolutional Neural Networks

Convolutional neural network (CNN) inference is commonly performed with ...
research
02/01/2023

Bit-balance: Model-Hardware Co-design for Accelerating NNs by Exploiting Bit-level Sparsity

Bit-serial architectures can handle Neural Networks (NNs) with different...
research
06/15/2020

Neural gradients are lognormally distributed: understanding sparse and quantized training

Neural gradient compression remains a main bottleneck in improving train...
research
11/24/2019

Pyramid Vector Quantization and Bit Level Sparsity in Weights for Efficient Neural Networks Inference

This paper discusses three basic blocks for the inference of convolution...
research
07/16/2022

Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation

Recently, deep convolutional neural networks (CNNs) have achieved many e...
research
02/28/2023

At-Scale Evaluation of Weight Clustering to Enable Energy-Efficient Object Detection

Accelerators implementing Deep Neural Networks for image-based object de...

Please sign up or login with your details

Forgot password? Click here to reset