On Data Scaling in Masked Image Modeling

06/09/2022
by   Zhenda Xie, et al.
29

An important goal of self-supervised learning is to enable model pre-training to benefit from almost unlimited data. However, one method that has recently become popular, namely masked image modeling (MIM), is suspected to be unable to benefit from larger data. In this work, we break this misconception through extensive experiments, with data scales ranging from 10% of ImageNet-1K to full ImageNet-22K, model sizes ranging from 49 million to 1 billion, and training lengths ranging from 125K iterations to 500K iterations. Our study reveals that: (i) Masked image modeling is also demanding on larger data. We observed that very large models got over-fitted with relatively small data; (ii) The length of training matters. Large models trained with masked image modeling can benefit from more data with longer training; (iii) The validation loss in pre-training is a good indicator to measure how well the model performs for fine-tuning on multiple tasks. This observation allows us to pre-evaluate pre-trained models in advance without having to make costly trial-and-error assessments of downstream tasks. We hope that our findings will advance the understanding of masked image modeling in terms of scaling ability.

READ FULL TEXT
research
05/24/2023

Delving Deeper into Data Scaling in Masked Image Modeling

Understanding whether self-supervised learning methods can scale with un...
research
03/09/2022

Inadequately Pre-trained Models are Better Feature Extractors

Pre-training has been a popular learning paradigm in deep learning era, ...
research
12/09/2022

Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

Computational pathology can lead to saving human lives, but models are a...
research
07/22/2019

Realistic Channel Models Pre-training

In this paper, we propose a neural-network-based realistic channel model...
research
03/28/2020

Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning

Pretrained models from self-supervision are prevalently used in fine-tun...
research
05/20/2023

What Makes for Good Visual Tokenizers for Large Language Models?

We empirically investigate proper pre-training methods to build good vis...
research
05/26/2022

Revealing the Dark Secrets of Masked Image Modeling

Masked image modeling (MIM) as pre-training is shown to be effective for...

Please sign up or login with your details

Forgot password? Click here to reset