General-purpose, long-context autoregressive modeling with Perceiver AR

02/15/2022
by   Curtis Hawthorne, et al.
2

Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64 x 64 ImageNet images and PG-19 books.

READ FULL TEXT

page 6

page 14

page 15

page 16

page 17

research
12/28/2017

PixelSNAIL: An Improved Autoregressive Generative Model

Autoregressive generative models consistently achieve the best results i...
research
05/12/2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Autoregressive transformers are spectacular models for short sequences b...
research
10/25/2020

Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

Fast inference speed is an important goal towards real-world deployment ...
research
07/14/2022

Scene Text Recognition with Permuted Autoregressive Sequence Models

Context-aware STR methods typically use internal autoregressive (AR) lan...
research
06/16/2022

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

Transformers have recently dominated the ASR field. Although able to yie...
research
11/27/2019

AR-Net: A simple Auto-Regressive Neural Network for time-series

In this paper we present a new framework for time-series modeling that c...
research
10/17/2019

Autoregressive Models: What Are They Good For?

Autoregressive (AR) models have become a popular tool for unsupervised l...

Please sign up or login with your details

Forgot password? Click here to reset