Layer-Wise Data-Free CNN Compression

11/18/2020
by   Maxwell Horton, et al.
8

We present an efficient method for compressing a trained neural network without using any data. Our data-free method requires 14x-450x fewer FLOPs than comparable state-of-the-art methods. We break the problem of data-free network compression into a number of independent layer-wise compressions. We show how to efficiently generate layer-wise training data, and how to precondition the network to maintain accuracy during layer-wise compression. We show state-of-the-art performance on MobileNetV1 for data-free low-bit-width quantization. We also show state-of-the-art performance on data-free pruning of EfficientNet B0 when combining our method with end-to-end generative methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2020

Automated Model Compression by Jointly Applied Pruning and Quantization

In the traditional deep compression framework, iteratively performing ne...
research
01/17/2022

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization (PTQ) method achievin...
research
12/15/2018

A Low Effort Approach to Structured CNN Design Using PCA

Deep learning models hold state of the art performance in many fields, y...
research
09/13/2017

Flexible Network Binarization with Layer-wise Priority

How to effectively approximate real-valued parameters with binary codes ...
research
02/16/2022

Practical Network Acceleration with Tiny Sets

Network compression is effective in accelerating the inference of deep n...
research
11/12/2018

Generalized Ternary Connect: End-to-End Learning and Compression of Multiplication-Free Deep Neural Networks

The use of deep neural networks in edge computing devices hinges on the ...
research
05/04/2023

Input Layer Binarization with Bit-Plane Encoding

Binary Neural Networks (BNNs) use 1-bit weights and activations to effic...

Please sign up or login with your details

Forgot password? Click here to reset