Hierarchical Perceiver

02/22/2022
by   Joao Carreira, et al.
2

General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by exclusively using global attention operations. This however hinders them from scaling up to the inputs sizes required to process raw high-resolution images or video. In this paper, we show that some degree of locality can be introduced back into these models, greatly improving their efficiency while preserving their generality. To scale them further, we introduce a self-supervised approach that enables learning dense low-dimensional positional embeddings for very large signals. We call the resulting model a Hierarchical Perceiver (HiP). HiP retains the ability to process arbitrary modalities, but now at higher-resolution and without any specialized preprocessing, improving over flat Perceivers in both efficiency and accuracy on the ImageNet, Audioset and PASCAL VOC datasets.

READ FULL TEXT

page 4

page 7

page 11

page 13

page 15

research
10/07/2022

A Simple Plugin for Transforming Images to Arbitrary Scales

Existing models on super-resolution often specialized for one scale, fun...
research
03/04/2021

Perceiver: General Perception with Iterative Attention

Biological systems understand the world by simultaneously processing hig...
research
11/24/2021

ViCE: Self-Supervised Visual Concept Embeddings as Contextual and Pixel Appearance Invariant Semantic Representations

This work presents a self-supervised method to learn dense semantically ...
research
05/12/2022

One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code

People perceive the world with multiple senses (e.g., through hearing so...
research
05/24/2023

T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities

Diffusion Probabilistic Field (DPF) models the distribution of continuou...
research
08/09/2023

A degree of image identification at sub-human scales could be possible with more advanced clusters

The purpose of the research is to determine if currently available self-...
research
06/06/2022

Mapping Visual Themes among Authentic and Coordinated Memes

What distinguishes authentic memes from those created by state actors? I...

Please sign up or login with your details

Forgot password? Click here to reset