Flexible Network Binarization with Layer-wise Priority

09/13/2017
by   Lixue Zhuang, et al.
0

How to effectively approximate real-valued parameters with binary codes plays a central role in neural network binarization. In this work, we reveal an important fact that binarizing different layers has a widely-varied effect on the compression ratio of network and the loss of performance. Based on this fact, we propose a novel and flexible neural network binarization method by introducing the concept of layer-wise priority which binarizes parameters in inverse order of their layer depth. In each training step, our method selects a specific network layer, minimizes the discrepancy between the original real-valued weights and its binary approximations, and fine-tunes the whole network accordingly. During the iteration of the above process, it is significant that we can flexibly decide whether to binarize the remaining floating layers or not and explore a trade-off between the loss of performance and the compression ratio of model. The resulting binary network is applied for efficient pedestrian detection. Extensive experimental results on several benchmarks show that under the same compression ratio, our method achieves much lower miss rate and faster detection speed than the state-of-the-art neural network binarization method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2020

Layer-Wise Data-Free CNN Compression

We present an efficient method for compressing a trained neural network ...
research
01/22/2016

Bitwise Neural Networks

Based on the assumption that there exists a neural network that efficien...
research
03/07/2020

ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions

In this paper, we propose several ideas for enhancing a binary network t...
research
10/12/2021

Improving Binary Neural Networks through Fully Utilizing Latent Weights

Binary Neural Networks (BNNs) rely on a real-valued auxiliary variable W...
research
01/17/2022

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization (PTQ) method achievin...
research
07/20/2018

Principal Filter Analysis for Guided Network Compression

Principal Filter Analysis (PFA), is an elegant, easy to implement, yet e...
research
12/07/2020

DiffPrune: Neural Network Pruning with Deterministic Approximate Binary Gates and L_0 Regularization

Modern neural network architectures typically have many millions of para...

Please sign up or login with your details

Forgot password? Click here to reset