Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models

05/25/2022
by   Clara Na, et al.
0

Model compression by way of parameter pruning, quantization, or distillation has recently gained popularity as an approach for reducing the computational requirements of modern deep neural network models for NLP. Pruning unnecessary parameters has emerged as a simple and effective method for compressing large models that is compatible with a wide variety of contemporary off-the-shelf hardware (unlike quantization), and that requires little additional training (unlike distillation). Pruning approaches typically take a large, accurate model as input, then attempt to discover a smaller subnetwork of that model capable of achieving end-task accuracy comparable to the full model. Inspired by previous work suggesting a connection between simpler, more generalizable models and those that lie within flat basins in the loss landscape, we propose to directly optimize for flat minima while performing task-specific pruning, which we hypothesize should lead to simpler parameterizations and thus more compressible models. In experiments combining sharpness-aware minimization with both iterative magnitude pruning and structured pruning approaches, we show that optimizing for flat minima consistently leads to greater compressibility of parameters compared to standard Adam optimization when fine-tuning BERT models, leading to higher rates of compression with little to no loss in accuracy on the GLUE classification benchmark.

READ FULL TEXT
research
04/01/2022

Structured Pruning Learns Compact and Accurate Models

The growing size of neural language models has led to increased attentio...
research
08/20/2022

Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks

Quantization, knowledge distillation, and magnitude pruning are among th...
research
10/13/2022

SQuAT: Sharpness- and Quantization-Aware Training for BERT

Quantization is an effective technique to reduce memory footprint, infer...
research
09/10/2021

Block Pruning For Faster Transformers

Pre-training has improved model accuracy for both classification and gen...
research
05/28/2023

Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Large pre-trained models (LPMs), such as LLaMA and ViT-G, have shown exc...
research
04/30/2020

Pruning artificial neural networks: a way to find well-generalizing, high-entropy sharp minima

Recently, a race towards the simplification of deep networks has begun, ...
research
06/25/2023

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

Robustness and compactness are two essential components of deep learning...

Please sign up or login with your details

Forgot password? Click here to reset