Good helper is around you: Attention-driven Masked Image Modeling

11/28/2022
by   Zhengqi Liu, et al.
0

It has been witnessed that masked image modeling (MIM) has shown a huge potential in self-supervised learning in the past year. Benefiting from the universal backbone vision transformer, MIM learns self-supervised visual representations through masking a part of patches of the image while attempting to recover the missing pixels. Most previous works mask patches of the image randomly, which underutilizes the semantic information that is beneficial to visual representation learning. On the other hand, due to the large size of the backbone, most previous works have to spend much time on pre-training. In this paper, we propose Attention-driven Masking and Throwing Strategy (AMT), which could solve both problems above. We first leverage the self-attention mechanism to obtain the semantic information of the image during the training process automatically without using any supervised methods. Masking strategy can be guided by that information to mask areas selectively, which is helpful for representation learning. Moreover, a redundant patch throwing strategy is proposed, which makes learning more efficient. As a plug-and-play module for masked image modeling, AMT improves the linear probing accuracy of MAE by 2.9%∼ 5.9% on CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-1K, and obtains an improved performance with respect to fine-tuning accuracy of MAE and SimMIM. Moreover, this design also achieves superior performance on downstream detection and segmentation tasks. Code is available at https://github.com/guijiejie/AMT.

READ FULL TEXT

page 1

page 3

page 4

page 5

research
06/12/2023

Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

The use of self-supervised pre-training has emerged as a promising appro...
research
06/21/2022

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

Recently, significant progress has been made in masked image modeling to...
research
11/17/2022

CAE v2: Context Autoencoder with CLIP Target

Masked image modeling (MIM) learns visual representation by masking and ...
research
03/15/2023

Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification

While Multiple Instance Learning (MIL) has shown promising results in di...
research
03/18/2023

HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image Segmentation

Masked image modeling (MIM) with transformer backbones has recently been...
research
08/12/2022

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

Masked image modeling (MIM) has demonstrated impressive results in self-...
research
11/18/2021

SimMIM: A Simple Framework for Masked Image Modeling

This paper presents SimMIM, a simple framework for masked image modeling...

Please sign up or login with your details

Forgot password? Click here to reset