Easy and Efficient Transformer : Scalable Inference Solution For large NLP mode

04/26/2021
by   Gongzheng li, et al.
0

The ultra-large-scale pre-training model can effectively improve the effect of a variety of tasks, and it also brings a heavy computational burden to inference. This paper introduces a series of ultra-large-scale pre-training model optimization methods that combine algorithm characteristics and GPU processor hardware characteristics, and on this basis, propose an inference engine – Easy and Efficient Transformer (EET), Which has a significant performance improvement over the existing schemes. We firstly introduce a pre-padding decoding mechanism that improves token parallelism for generation tasks. Then we design high optimized kernels to remove sequence masks and achieve cost-free calculation for padding tokens, as well as support long sequence and long embedding sizes. Thirdly a user-friendly inference system with an easy service pipeline was introduced which greatly reduces the difficulty of engineering deployment with high throughput. Compared to Faster Transformer's implementation for GPT-2 on A100, EET achieves a 1.5-15x state-of-art speedup varying with context length.EET is available https://github.com/NetEase-FuXi/EET.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/09/2021

Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding

In this paper, we propose Shallow Aggressive Decoding (SAD) to improve t...
01/13/2020

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

In this paper, we present a new sequence-to-sequence pre-training model ...
06/08/2021

FastSeq: Make Sequence Generation Faster

Transformer-based models have made tremendous impacts in natural languag...
05/31/2021

Effective Batching for Recurrent Neural Network Grammars

As a language model that integrates traditional symbolic operations and ...
02/11/2018

FD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy

We present Fast-Downsampling MobileNet (FD-MobileNet), an efficient and ...
12/19/2021

On Efficient Transformer and Image Pre-training for Low-level Vision

Pre-training has marked numerous state of the arts in high-level compute...
06/05/2020

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

With the success of language pretraining, it is highly desirable to deve...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.