To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency

04/05/2023
by   Daniel Campos, et al.
0

Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise. Still, model sizes can make deployment in latency-sensitive or web-scale implementations difficult. This paper studies the relationship between model size, structured pruning, inference efficiency, and summarization accuracy on widely used summarization datasets. We show that model accuracy is tied to the encoder size while inference efficiency is connected to the decoder. Using asymmetric pruning can lead to nearly 3x improvement in inference latency with  1 point loss in Rouge-2. Moreover, we find both the average degradation and the role of asymmetry to be consistent across model sizes and variations in datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2023

Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text Sequence-to-Sequence modeling

The increasing size of language models raises great research interests i...
research
05/18/2023

PDP: Parameter-free Differentiable Pruning is All You Need

DNN pruning is a popular way to reduce the size of a model, improve the ...
research
08/02/2017

Deep Recurrent Generative Decoder for Abstractive Text Summarization

We propose a new framework for abstractive text summarization based on a...
research
01/27/2023

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Improving the deployment efficiency of transformer-based language models...
research
04/15/2022

Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models

With many real-world applications of Natural Language Processing (NLP) c...
research
12/05/2018

Neural Abstractive Text Summarization with Sequence-to-Sequence Models

In the past few years, neural abstractive text summarization with sequen...
research
10/08/2021

Performance optimizations on deep noise suppression models

We study the role of magnitude structured pruning as an architecture sea...

Please sign up or login with your details

Forgot password? Click here to reset