A Unified View of Masked Image Modeling

10/19/2022
by   Zhiliang Peng, et al.
0

Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers, achieving impressive performance on various downstream tasks. In this work, we propose a unified view of masked image modeling after revisiting existing methods. Under the unified view, we introduce a simple yet effective method, termed as MaskDistill, which reconstructs normalized semantic features from teacher models at the masked positions, conditioning on corrupted input images. Experimental results on image classification and semantic segmentation show that MaskDistill achieves comparable or superior performance than state-of-the-art methods. When using the huge vision Transformer and pretraining 300 epochs, MaskDistill obtains 88.3 ImageNet-1k (224 size) and 58.8 (512 size). The code and pretrained models will be available at https://aka.ms/unimim.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2022

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

Masked image modeling (MIM) has demonstrated impressive results in self-...
research
06/02/2022

VL-BEiT: Generative Vision-Language Pretraining

We introduce a vision-language foundation model called VL-BEiT, which is...
research
03/31/2023

LaCViT: A Label-aware Contrastive Training Framework for Vision Transformers

Vision Transformers have been incredibly effective when tackling compute...
research
08/22/2022

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

A big convergence of language, vision, and multimodal pretraining is eme...
research
11/03/2021

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

We present a unified Vision-Language pretrained Model (VLMo) that jointl...
research
05/24/2023

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

Recently, plain vision Transformers (ViTs) have shown impressive perform...
research
06/27/2023

MIMIC: Masked Image Modeling with Image Correspondences

Many pixelwise dense prediction tasks-depth estimation and semantic segm...

Please sign up or login with your details

Forgot password? Click here to reset