DeepAI AI Chat
Log In Sign Up

Perceiver IO: A General Architecture for Structured Inputs Outputs

07/30/2021
by   Andrew Jaegle, et al.
6

The recently-proposed Perceiver model obtains good results on several domains (images, audio, multimodal, point clouds) while scaling linearly in compute and memory with the input size. While the Perceiver supports many kinds of inputs, it can only produce very simple outputs such as class scores. Perceiver IO overcomes this limitation without sacrificing the original's appealing properties by learning to flexibly query the model's latent space to produce outputs of arbitrary size and semantics. Perceiver IO still decouples model depth from data size and still scales linearly with data size, but now with respect to both input and output sizes. The full Perceiver IO model achieves strong results on tasks with highly structured output spaces, such as natural language and visual understanding, StarCraft II, and multi-task and multi-modal domains. As highlights, Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark without the need for input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation.

READ FULL TEXT

page 2

page 9

page 21

03/30/2022

FlowFormer: A Transformer Architecture for Optical Flow

We introduce Optical Flow TransFormer (FlowFormer), a transformer-based ...
06/17/2022

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

We propose Unified-IO, a model that performs a large variety of AI tasks...
04/14/2021

StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis

We propose a novel approach for multi-modal Image-to-image (I2I) transla...
12/13/2019

WaLDORf: Wasteless Language-model Distillation On Reading-comprehension

Transformer based Very Large Language Models (VLLMs) like BERT, XLNet an...
01/03/2023

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

In this paper, we propose a robust 3D detector, named Cross Modal Transf...
09/10/2020

Multi-modal embeddings using multi-task learning for emotion recognition

General embeddings like word2vec, GloVe and ELMo have shown a lot of suc...
11/17/2015

Gated Graph Sequence Neural Networks

Graph-structured data appears frequently in domains including chemistry,...

Code Repositories

perceiver-pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch


view repo

AlexRogalskiy

🌞 Profile of 𝘼𝙡𝙚𝙭𝙖𝙣𝙙𝙚𝙧 𝙍𝙤𝙜𝙖𝙡𝙨𝙠𝙞𝙮


view repo