Exploring Long-Sequence Masked Autoencoders

10/13/2022
by   Ronghang Hu, et al.
7

Masked Autoencoding (MAE) has emerged as an effective approach for pre-training representations across multiple domains. In contrast to discrete tokens in natural languages, the input for image MAE is continuous and subject to additional specifications. We systematically study each input specification during the pre-training stage, and find sequence length is a key axis that further scales MAE. Our study leads to a long-sequence version of MAE with minimal changes to the original recipe, by just decoupling the mask size from the patch size. For object detection and semantic segmentation, our long-sequence MAE shows consistent gains across all the experimental setups without extra computation cost during the transfer. While long-sequence pre-training is discerned most beneficial for detection and segmentation, we also achieve strong results on ImageNet-1K classification by keeping a standard image size and only increasing the sequence length. We hope our findings can provide new insights and avenues for scaling in computer vision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2021

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

This paper explores a better codebook for BERT pre-training of vision tr...
research
06/11/2020

Rethinking Pre-training and Self-training

Pre-training is a dominant paradigm in computer vision. For example, sup...
research
02/28/2023

Efficient Masked Autoencoders with Self-Consistency

Inspired by masked language modeling (MLM) in natural language processin...
research
09/02/2023

RevColV2: Exploring Disentangled Representations in Masked Image Modeling

Masked image modeling (MIM) has become a prevalent pre-training setup fo...
research
12/01/2022

Scaling Language-Image Pre-training via Masking

We present Fast Language-Image Pre-training (FLIP), a simple and more ef...
research
06/16/2023

Robot Learning with Sensorimotor Pre-training

We present a self-supervised sensorimotor pre-training approach for robo...
research
03/29/2022

mc-BEiT: Multi-choice Discretization for Image BERT Pre-training

Image BERT pre-training with masked image modeling (MIM) becomes a popul...

Please sign up or login with your details

Forgot password? Click here to reset