Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling

12/31/2022
by   Xin Ma, et al.
0

Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8 fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.

READ FULL TEXT

page 2

page 3

research
03/30/2022

Self-Distillation from the Last Mini-Batch for Consistency Regularization

Knowledge distillation (KD) shows a bright promise as a powerful regular...
research
05/28/2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

Masked Autoencoders (MAE) have shown great potentials in self-supervised...
research
03/31/2023

INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields

Perception datasets for agriculture are limited both in quantity and div...
research
07/20/2023

SLPD: Slide-level Prototypical Distillation for WSIs

Improving the feature representation ability is the foundation of many w...
research
11/15/2021

iBOT: Image BERT Pre-Training with Online Tokenizer

The success of language Transformers is primarily attributed to the pret...
research
06/18/2023

Enhanced Masked Image Modeling for Analysis of Dental Panoramic Radiographs

The computer-assisted radiologic informative report has received increas...
research
11/14/2022

Self-distillation with Online Diffusion on Batch Manifolds Improves Deep Metric Learning

Recent deep metric learning (DML) methods typically leverage solely clas...

Please sign up or login with your details

Forgot password? Click here to reset