Perceiver IO: A General Architecture for Structured Inputs Outputs

07/30/2021
by   Andrew Jaegle, et al.
6

The recently-proposed Perceiver model obtains good results on several domains (images, audio, multimodal, point clouds) while scaling linearly in compute and memory with the input size. While the Perceiver supports many kinds of inputs, it can only produce very simple outputs such as class scores. Perceiver IO overcomes this limitation without sacrificing the original's appealing properties by learning to flexibly query the model's latent space to produce outputs of arbitrary size and semantics. Perceiver IO still decouples model depth from data size and still scales linearly with data size, but now with respect to both input and output sizes. The full Perceiver IO model achieves strong results on tasks with highly structured output spaces, such as natural language and visual understanding, StarCraft II, and multi-task and multi-modal domains. As highlights, Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark without the need for input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation.

READ FULL TEXT

page 2

page 9

page 21

research
03/30/2022

FlowFormer: A Transformer Architecture for Optical Flow

We introduce Optical Flow TransFormer (FlowFormer), a transformer-based ...
research
06/17/2022

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

We propose Unified-IO, a model that performs a large variety of AI tasks...
research
04/14/2021

StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis

We propose a novel approach for multi-modal Image-to-image (I2I) transla...
research
12/13/2019

WaLDORf: Wasteless Language-model Distillation On Reading-comprehension

Transformer based Very Large Language Models (VLLMs) like BERT, XLNet an...
research
06/12/2020

A Generative Model for Joint Natural Language Understanding and Generation

Natural language understanding (NLU) and natural language generation (NL...
research
05/09/2012

Structured Input-Output Lasso, with Application to eQTL Mapping, and a Thresholding Algorithm for Fast Estimation

We consider the problem of learning a high-dimensional multi-task regres...
research
11/17/2015

Gated Graph Sequence Neural Networks

Graph-structured data appears frequently in domains including chemistry,...

Please sign up or login with your details

Forgot password? Click here to reset