ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

10/06/2022
by   Yujia Zhai, et al.
0

Transformer is the cornerstone model of Natural Language Processing (NLP) over the past decade. Despite its great success in Deep Learning (DL) applications, the increasingly growing parameter space required by transformer models boosts the demand on accelerating the performance of transformer models. In addition, NLP problems can commonly be faced with variable-length sequences since their word numbers can vary among sentences. Existing DL frameworks need to pad variable-length sequences to the maximal length, which, however, leads to significant memory and computational overhead. In this paper, we present ByteTransformer, a high-performance transformer boosted for variable-length inputs. We propose a zero padding algorithm that enables the whole transformer to be free from redundant computations on useless padded tokens. Besides the algorithmic level optimization, we provide architectural-aware optimizations for transformer functioning modules, especially the performance-critical algorithm, multi-head attention (MHA). Experimental results on an NVIDIA A100 GPU with variable-length sequence inputs validate that our fused MHA (FMHA) outperforms the standard PyTorch MHA by 6.13X. The end-to-end performance of ByteTransformer for a standard BERT transformer model surpasses the state-of-the-art Transformer frameworks, such as PyTorch JIT, TensorFlow XLA, Tencent TurboTransformer and NVIDIA FasterTransformer, by 87%, 131%, 138% and 46%, respectively.

READ FULL TEXT

page 1

page 4

page 9

research
08/17/2022

Boosting Distributed Training Performance of the Unpadded BERT Model

Pre-training models are an important tool in Natural Language Processing...
research
10/09/2020

TurboTransformers: An Efficient GPU Serving System For Transformer Models

The transformer is the most critical algorithm innovation of the Nature ...
research
10/23/2020

LightSeq: A High Performance Inference Library for Transformers

Transformer, BERT and their variants have achieved great success in natu...
research
07/05/2023

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Scaling sequence length has become a critical demand in the era of large...
research
04/19/2023

Scaling Transformer to 1M tokens and beyond with RMT

This technical report presents the application of a recurrent memory to ...
research
08/20/2021

Semantic Communication with Adaptive Universal Transformer

With the development of deep learning (DL), natural language processing ...
research
11/06/2020

Deep Learning for Flight Demand and Delays Forecasting

The last few years have seen an increased interest in deep learning (DL)...

Please sign up or login with your details

Forgot password? Click here to reset