FastSeq: Make Sequence Generation Faster

06/08/2021
by   Yu Yan, et al.
4

Transformer-based models have made tremendous impacts in natural language generation. However the inference speed is a bottleneck due to large model size and intensive computing involved in auto-regressive decoding process. We develop FastSeq framework to accelerate sequence generation without accuracy loss. The proposed optimization techniques include an attention cache optimization, an efficient algorithm for detecting repeated n-grams, and an asynchronous generation pipeline with parallel I/O. These optimizations are general enough to be applicable to Transformer-based models (e.g., T5, GPT2, and UniLM). Our benchmark results on a set of widely used and diverse models demonstrate 4-9x inference speed gain. Additionally, FastSeq is easy to use with a simple one-line code change. The source code is available at https://github.com/microsoft/fastseq.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/27/2020

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Large-scale pre-trained language models such as BERT have brought signif...
07/31/2020

Language Modelling for Source Code with Transformer-XL

It has been found that software, like natural language texts, exhibits "...
10/23/2020

LightSeq: A High Performance Inference Library for Transformers

Transformer, BERT and their variants have achieved great success in natu...
04/26/2021

Visformer: The Vision-friendly Transformer

The past year has witnessed the rapid development of applying the Transf...
04/26/2021

Easy and Efficient Transformer : Scalable Inference Solution For large NLP mode

The ultra-large-scale pre-training model can effectively improve the eff...
05/11/2021

EL-Attention: Memory Efficient Lossless Attention for Generation

Transformer model with multi-head attention requires caching intermediat...
09/15/2020

A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation

This work studies the widely adopted ancestral sampling algorithms for a...

Code Repositories

fastseq

An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/pdf/2106.04718.pdf


view repo