Unlimiformer: Long-Range Transformers with Unlimited Length Input

05/02/2023
by   Amanda Bertsch, et al.
0

Transformer-based models typically have a predefined bound to their input length, because of their need to potentially attend to every token in the input. In this work, we propose Unlimiformer: a general approach that can wrap any existing pretrained encoder-decoder transformer, and offload the attention computation across all layers to a single k-nearest-neighbor index; this index can be kept on either the GPU or CPU memory and queried in sub-linear time. This way, we can index extremely long input sequences, while every attention head in every decoder layer retrieves its top-k keys, instead of attending to every key. We demonstrate Unlimiformers's efficacy on several long-document and multi-document summarization benchmarks, showing that it can summarize even 350k token-long inputs from the BookSum dataset, without any input truncation at test time. Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code. We make our code and models publicly available at https://github.com/abertsch72/unlimiformer .

READ FULL TEXT
research
08/08/2022

Investigating Efficiently Extending Transformers for Long Input Summarization

While large pretrained Transformer models have proven highly capable at ...
research
03/17/2023

CoLT5: Faster Long-Range Transformers with Conditional Computation

Many natural language processing tasks benefit from long inputs, but pro...
research
11/25/2021

New Approaches to Long Document Summarization: Fourier Transform Based Attention in a Transformer Model

In this work, we extensively redesign the newly introduced method of tok...
research
05/24/2023

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

The transformer model is known to be computationally demanding, and proh...
research
02/07/2023

Transformer-based Models for Long-Form Document Matching: Challenges and Empirical Analysis

Recent advances in the area of long document matching have primarily foc...
research
05/04/2023

On the Expressivity Role of LayerNorm in Transformers' Attention

Layer Normalization (LayerNorm) is an inherent component in all Transfor...
research
06/05/2023

DecompX: Explaining Transformers Decisions by Propagating Token Decomposition

An emerging solution for explaining Transformer-based models is to use v...

Please sign up or login with your details

Forgot password? Click here to reset