LongT5: Efficient Text-To-Text Transformer for Long Sequences

by   Mandy Guo, et al.

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call Transient Global (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.


page 1

page 2

page 3

page 4


ETC: Encoding Long and Structured Data in Transformers

Transformer-based models have pushed the state of the art in many natura...

An Attention Mechanism for Answer Selection Using a Combined Global and Local View

We propose a new attention mechanism for neural based question answering...

Skim-Attention: Learning to Focus via Document Layout

Transformer-based pre-training techniques of text and layout have proven...

Reformer: The Efficient Transformer

Large Transformer models routinely achieve state-of-the-art results on a...

ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer

The analysis of long sequence data remains challenging in many real-worl...

mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences

We present our work on developing a multilingual, efficient text-to-text...

LittleBird: Efficient Faster Longer Transformer for Question Answering

BERT has shown a lot of sucess in a wide variety of NLP tasks. But it ha...

Please sign up or login with your details

Forgot password? Click here to reset