Compression-aware Training of Neural Networks using Frank-Wolfe

05/24/2022
by   Max Zimmer, et al.
0

Many existing Neural Network pruning approaches either rely on retraining to compensate for pruning-caused performance degradation or they induce strong biases to converge to a specific sparse solution throughout training. A third paradigm obtains a wide range of compression ratios from a single dense training run while also avoiding retraining. Recent work of Pokutta et al. (2020) and Miao et al. (2022) suggests that the Stochastic Frank-Wolfe (SFW) algorithm is particularly suited for training state-of-the-art models that are robust to compression. We propose leveraging k-support norm ball constraints and demonstrate significant improvements over the results of Miao et al. (2022) in the case of unstructured pruning. We also extend these ideas to the structured pruning domain and propose novel approaches to both ensure robustness to the pruning of convolutional filters as well as to low-rank tensor decompositions of convolutional layers. In the latter case, our approach performs on-par with nuclear-norm regularization baselines while requiring only half of the computational resources. Our findings also indicate that the robustness of SFW-trained models largely depends on the gradient rescaling of the learning rate and we establish a theoretical foundation for that practice.

READ FULL TEXT

page 17

page 18

research
02/19/2020

Pruning untrained neural networks: Principles and Analysis

Overparameterized neural networks display state-of-the art performance. ...
research
02/25/2019

The State of Sparsity in Deep Neural Networks

We rigorously evaluate three state-of-the-art techniques for inducing sp...
research
03/25/2022

Vision Transformer Compression with Structured Pruning and Low Rank Approximation

Transformer architecture has gained popularity due to its ability to sca...
research
06/25/2023

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

Robustness and compactness are two essential components of deep learning...
research
03/26/2023

Does `Deep Learning on a Data Diet' reproduce? Overall yes, but GraNd at Initialization does not

The paper 'Deep Learning on a Data Diet' by Paul et al. (2021) introduce...
research
07/01/2022

Studying the impact of magnitude pruning on contrastive learning methods

We study the impact of different pruning techniques on the representatio...
research
05/30/2023

Rank-adaptive spectral pruning of convolutional layers during training

The computing cost and memory demand of deep learning pipelines have gro...

Please sign up or login with your details

Forgot password? Click here to reset