Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

12/20/2021
by   Alaaeldin El-Nouby, et al.
25

Pre-training models on large scale datasets, like ImageNet, is a standard practice in computer vision. This paradigm is especially effective for tasks with small training sets, for which high-capacity models tend to overfit. In this work, we consider a self-supervised pre-training scenario that only leverages the target task data. We consider datasets, like Stanford Cars, Sketch or COCO, which are order(s) of magnitude smaller than Imagenet. Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings.We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains. On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2022

Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

Self-supervised methods have achieved remarkable success in transfer lea...
research
06/11/2020

Rethinking Pre-training and Self-training

Pre-training is a dominant paradigm in computer vision. For example, sup...
research
12/11/2022

SEPT: Towards Scalable and Efficient Visual Pre-Training

Recently, the self-supervised pre-training paradigm has shown great pote...
research
05/28/2022

SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners

Self-supervised Masked Autoencoders (MAE) are emerging as a new pre-trai...
research
03/23/2022

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Pre-training video transformers on extra large-scale datasets is general...
research
02/27/2023

Internet Explorer: Targeted Representation Learning on the Open Web

Modern vision models typically rely on fine-tuning general-purpose model...
research
05/19/2021

Unsupervised Discriminative Learning of Sounds for Audio Event Classification

Recent progress in network-based audio event classification has shown th...

Please sign up or login with your details

Forgot password? Click here to reset