Deep Neural Compression Via Concurrent Pruning and Self-Distillation

09/30/2021
by   James O'Neill, et al.
0

Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel self-distillation based pruning strategy, whereby the representational similarity between the pruned and unpruned versions of the same network is maximized. Unlike previous approaches that treat distillation and pruning separately, we use distillation to inform the pruning criteria, without requiring a separate student network as in knowledge distillation. We show that the proposed cross-correlation objective for self-distilled pruning implicitly encourages sparse solutions, naturally complementing magnitude-based pruning criteria. Experiments on the GLUE and XGLUE benchmarks show that self-distilled pruning increases mono- and cross-lingual language model performance. Self-distilled pruned models also outperform smaller Transformers with an equal number of parameters and are competitive against (6 times) larger distilled networks. We also observe that self-distillation (1) maximizes class separability, (2) increases the signal-to-noise ratio, and (3) converges faster after pruning steps, providing further insights into why self-distilled pruning improves generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Various pruning approaches have been proposed to reduce the footprint re...
research
09/30/2021

Prune Your Model Before Distill It

Unstructured pruning reduces a significant amount of weights of neural n...
research
05/28/2023

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

Self-supervised learning (SSL) has achieved notable success in many spee...
research
09/13/2022

Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching

Previous studies have proved that cross-lingual knowledge distillation c...
research
05/11/2019

Play and Prune: Adaptive Filter Pruning for Deep Model Compression

While convolutional neural networks (CNN) have achieved impressive perfo...
research
10/21/2021

Class-Discriminative CNN Compression

Compressing convolutional neural networks (CNNs) by pruning and distilla...
research
10/24/2018

Distilling with Performance Enhanced Students

The task of accelerating large neural networks on general purpose hardwa...

Please sign up or login with your details

Forgot password? Click here to reset