A Targeted Acceleration and Compression Framework for Low bit Neural Networks

07/09/2019
by   Biao Qian, et al.
0

1 bit deep neural networks (DNNs), of which both the activations and weights are binarized , are attracting more and more attention due to their high computational efficiency and low memory requirement . However, the drawback of large accuracy dropping also restrict s its application. In this paper, we propose a novel Targeted Acceleration and Compression (TAC) framework to improve the performance of 1 bit deep neural networks W e consider that the acceleration and compression effects of binarizing fully connected layer s are not sufficient to compensate for the accuracy loss caused by it In the proposed framework, t he convolutional and fully connected layer are separated and optimized i ndividually . F or the convolutional layer s , both the activations and weights are binarized. For the fully connected layer s, the binarization operation is re placed by network pruning and low bit quantization. The proposed framework is implemented on the CIFAR 10, CIFAR 100 and ImageNet ( ILSVRC 12 ) datasets , and experimental results show that the proposed TAC can significantly improve the accuracy of 1 bit deep neural networks and outperforms the state of the art by more than 6 percentage points .

READ FULL TEXT
research
11/01/2019

Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters

Effective employment of deep neural networks (DNNs) in mobile devices an...
research
07/29/2020

Compressing Deep Neural Networks via Layer Fusion

This paper proposes layer fusion - a model compression technique that di...
research
08/26/2023

MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree

Binary neural networks (BNNs) have been widely adopted to reduce the com...
research
03/29/2021

Deep Compression for PyTorch Model Deployment on Microcontrollers

Neural network deployment on low-cost embedded systems, hence on microco...
research
11/25/2020

Low Latency CMOS Hardware Acceleration for Fully Connected Layers in Deep Neural Networks

We present a novel low latency CMOS hardware accelerator for fully conne...
research
07/26/2018

A Unified Approximation Framework for Deep Neural Networks

Deep neural networks (DNNs) have achieved significant success in a varie...
research
05/04/2023

Input Layer Binarization with Bit-Plane Encoding

Binary Neural Networks (BNNs) use 1-bit weights and activations to effic...

Please sign up or login with your details

Forgot password? Click here to reset