The effectiveness of MAE pre-pretraining for billion-scale pretraining

03/23/2023
by   Mannat Singh, et al.
0

This paper revisits the standard pretrain-then-finetune paradigm used in computer vision for visual recognition tasks. Typically, state-of-the-art foundation models are pretrained using large scale (weakly) supervised datasets with billions of images. We introduce an additional pre-pretraining stage that is simple and uses the self-supervised MAE technique to initialize the model. While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well. Thus, our MAE-based pre-pretraining scales with both model and data size making it applicable for training foundation models. Pre-pretraining consistently improves both the model convergence and the downstream transfer performance across a range of model scales (millions to billions of parameters), and dataset sizes (millions to billions of images). We measure the effectiveness of pre-pretraining on 10 different visual recognition tasks spanning image classification, video recognition, object detection, low-shot classification and zero-shot recognition. Our largest model achieves new state-of-the-art results on iNaturalist-18 (91.3 Food-101 (96.0 significant role, even for web-scale pretraining with billions of images.

READ FULL TEXT
research
01/07/2021

Self-Supervised Pretraining of 3D Features on any Point-Cloud

Pretraining on large labeled datasets is a prerequisite to achieve good ...
research
04/02/2023

Video Pretraining Advances 3D Deep Learning on Chest CT Tasks

Pretraining on large natural image classification datasets such as Image...
research
07/16/2021

Rectifying the Shortcut Learning of Background: Shared Object Concentration for Few-Shot Image Recognition

Few-Shot image classification aims to utilize pretrained knowledge learn...
research
10/18/2016

Master's Thesis : Deep Learning for Visual Recognition

The goal of our research is to develop methods advancing automatic visua...
research
04/07/2022

BankNote-Net: Open dataset for assistive universal currency recognition

Millions of people around the world have low or no vision. Assistive sof...
research
05/02/2018

Exploring the Limits of Weakly Supervised Pretraining

State-of-the-art visual perception models for a wide range of tasks rely...
research
09/21/2023

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

Current speaker recognition systems primarily rely on supervised approac...

Please sign up or login with your details

Forgot password? Click here to reset