DeepAI AI Chat
Log In Sign Up

Perceiver IO: A General Architecture for Structured Inputs Outputs

by   Andrew Jaegle, et al.

The recently-proposed Perceiver model obtains good results on several domains (images, audio, multimodal, point clouds) while scaling linearly in compute and memory with the input size. While the Perceiver supports many kinds of inputs, it can only produce very simple outputs such as class scores. Perceiver IO overcomes this limitation without sacrificing the original's appealing properties by learning to flexibly query the model's latent space to produce outputs of arbitrary size and semantics. Perceiver IO still decouples model depth from data size and still scales linearly with data size, but now with respect to both input and output sizes. The full Perceiver IO model achieves strong results on tasks with highly structured output spaces, such as natural language and visual understanding, StarCraft II, and multi-task and multi-modal domains. As highlights, Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark without the need for input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation.


page 2

page 9

page 21


FlowFormer: A Transformer Architecture for Optical Flow

We introduce Optical Flow TransFormer (FlowFormer), a transformer-based ...

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

We propose Unified-IO, a model that performs a large variety of AI tasks...

StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis

We propose a novel approach for multi-modal Image-to-image (I2I) transla...

WaLDORf: Wasteless Language-model Distillation On Reading-comprehension

Transformer based Very Large Language Models (VLLMs) like BERT, XLNet an...

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

In this paper, we propose a robust 3D detector, named Cross Modal Transf...

Multi-modal embeddings using multi-task learning for emotion recognition

General embeddings like word2vec, GloVe and ELMo have shown a lot of suc...

Gated Graph Sequence Neural Networks

Graph-structured data appears frequently in domains including chemistry,...

Code Repositories


Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

view repo


🌞 Profile of 𝘼𝙡𝙚𝙭𝙖𝙣𝙙𝙚𝙧 𝙍𝙤𝙜𝙖𝙡𝙨𝙠𝙞𝙮

view repo