FastSeq: Make Sequence Generation Faster

06/08/2021
by   Yu Yan, et al.
4

Transformer-based models have made tremendous impacts in natural language generation. However the inference speed is a bottleneck due to large model size and intensive computing involved in auto-regressive decoding process. We develop FastSeq framework to accelerate sequence generation without accuracy loss. The proposed optimization techniques include an attention cache optimization, an efficient algorithm for detecting repeated n-grams, and an asynchronous generation pipeline with parallel I/O. These optimizations are general enough to be applicable to Transformer-based models (e.g., T5, GPT2, and UniLM). Our benchmark results on a set of widely used and diverse models demonstrate 4-9x inference speed gain. Additionally, FastSeq is easy to use with a simple one-line code change. The source code is available at https://github.com/microsoft/fastseq.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2022

Video Prediction by Efficient Transformers

Video prediction is a challenging computer vision task that has a wide r...
research
04/27/2020

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Large-scale pre-trained language models such as BERT have brought signif...
research
10/23/2020

LightSeq: A High Performance Inference Library for Transformers

Transformer, BERT and their variants have achieved great success in natu...
research
07/03/2023

Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch

We propose Rockmate to control the memory requirements when training PyT...
research
04/26/2021

Visformer: The Vision-friendly Transformer

The past year has witnessed the rapid development of applying the Transf...
research
03/17/2021

ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer

Pretrained using large amount of data, autoregressive language models ar...
research
03/15/2017

Character-based Neural Embeddings for Tweet Clustering

In this paper we show how the performance of tweet clustering can be imp...

Please sign up or login with your details

Forgot password? Click here to reset