MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

05/30/2021
by   Zhewei Yao, et al.
21

Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. However, current approaches either only explore head pruning, which has a limited pruning ratio, or only focus on unstructured pruning, which has negligible effects on the real inference time and/or power consumption. To address these challenges, we develop a novel MultiLevel structured Pruning (MLPruning) framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning. We propose using a learnable Top-k threshold, which employs an adaptive regularization to adjust the regularization magnitude adaptively, to select appropriate pruning ratios for different weight matrices. We also propose a two-step pipeline to combine block-wise pruning with head/row pruning to achieve high structured pruning ratios with minimum accuracy degradation. Our empirical results show that for , with 20% of remaining weights, can achieve an accuracy that is comparable to the full model on QQP/MNLI/, with up to 3.69x speedup. Our framework has been open sourced <cit.>.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 10

page 18

01/22/2021

Hessian-Aware Pruning and Optimal Neural Implant

Pruning is an effective method to reduce the memory footprint and FLOPs ...
04/17/2021

Visual Transformer Pruning

Visual transformer has achieved competitive performance on a variety of ...
09/18/2021

Structured Pattern Pruning Using Regularization

Iterative Magnitude Pruning (IMP) is a network pruning method that repea...
06/16/2021

Algorithm to Compilation Co-design: An Integrated View of Neural Network Sparsity

Reducing computation cost, inference latency, and memory footprint of ne...
10/05/2020

Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior

Traditional (unstructured) pruning methods for a Transformer model focus...
12/14/2021

Pruning Coherent Integrated Photonic Neural Networks Using the Lottery Ticket Hypothesis

Singular-value-decomposition-based coherent integrated photonic neural n...
10/08/2021

Performance optimizations on deep noise suppression models

We study the role of magnitude structured pruning as an architecture sea...

Code Repositories

MLPruning

MLPruning, PyTorch, NLP, BERT, Structured Pruning


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.