SPDY: Accurate Pruning with Speedup Guarantees

01/31/2022
by   Elias Frantar, et al.
0

The recent focus on the efficiency of deep neural networks (DNNs) has led to significant work on model compression approaches, of which weight pruning is one of the most popular. At the same time, there is rapidly-growing computational support for efficiently executing the unstructured-sparse models obtained via pruning. Yet, most existing pruning methods minimize just the number of remaining weights, i.e. the size of the model, rather than optimizing for inference time. We address this gap by introducing SPDY, a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup on a given system, while minimizing accuracy loss. SPDY is composed of two new techniques: the first is an efficient dynamic programming algorithm for solving the speedup-constrained layer-wise compression problem assuming a set of given layer-wise sensitivity scores; the second is a local search procedure for determining accurate layer-wise sensitivity scores. Experiments across popular vision and language models show that SPDY guarantees speedups while recovering higher accuracy relative to existing strategies, both for one-shot and gradual pruning scenarios, and is compatible with most existing pruning approaches. We also extend our approach to the recently-proposed task of pruning with very little data, where we achieve the best known accuracy recovery when pruning to the GPU-supported 2:4 sparsity pattern.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

oViT: An Accurate Second-Order Pruning Framework for Vision Transformers

Models from the Vision Transformer (ViT) family have recently provided b...
research
01/09/2019

How Compact?: Assessing Compactness of Representations through Layer-Wise Pruning

Various forms of representations may arise in the many layers embedded i...
research
08/24/2022

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

We consider the problem of model compression for deep neural networks (D...
research
11/01/2021

Learning Pruned Structure and Weights Simultaneously from Scratch: an Attention based Approach

As a deep learning model typically contains millions of trainable weight...
research
05/24/2023

PruMUX: Augmenting Data Multiplexing with Model Compression

As language models increase in size by the day, methods for efficient in...
research
10/08/2021

Performance optimizations on deep noise suppression models

We study the role of magnitude structured pruning as an architecture sea...
research
09/18/2022

Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions

Pruning is one of the predominant approaches for compressing deep neural...

Please sign up or login with your details

Forgot password? Click here to reset