Compressing Deep Neural Networks via Layer Fusion

07/29/2020
by   James O'Neill, et al.
18

This paper proposes layer fusion - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10, we find that various deep convolution neural networks can remain within 2% accuracy points of the original networks up to a compression ratio of 3.33 when iteratively retrained with layer fusion. For experiments on the WikiText-2 language modelling dataset where pretrained transformer models are used, we achieve compression that leads to a network that is 20% of its original size while being within 5 perplexity points of the original network. We also find that other well-established compression techniques can achieve competitive performance when compared to their original networks given a sufficient number of retraining steps. Generally, we observe a clear inflection point in performance as the amount of compression increases, suggesting a bound on the amount of compression that can be achieved before an exponential degradation in performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2019

A Targeted Acceleration and Compression Framework for Low bit Neural Networks

1 bit deep neural networks (DNNs), of which both the activations and wei...
research
07/20/2018

Principal Filter Analysis for Guided Network Compression

Principal Filter Analysis (PFA), is an elegant, easy to implement, yet e...
research
01/15/2018

Deep Net Triage: Assessing the Criticality of Network Layers by Structural Compression

Deep network compression seeks to reduce the number of parameters in the...
research
08/21/2023

Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression

Echo cancellation and noise reduction are essential for full-duplex comm...
research
03/30/2022

A Fast Transformer-based General-Purpose Lossless Compressor

Deep-learning-based compressor has received interests recently due to mu...
research
09/30/2022

Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification

Language Models pretrained on large textual data have been shown to enco...
research
04/12/2021

Generalization bounds via distillation

This paper theoretically investigates the following empirical phenomenon...

Please sign up or login with your details

Forgot password? Click here to reset