Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

09/30/2022
by   Skanda Koppula, et al.
0

Self-supervised methods have achieved remarkable success in transfer learning, often achieving the same or better accuracy than supervised pre-training. Most prior work has done so by increasing pre-training computation by adding complex data augmentation, multiple views, or lengthy training schedules. In this work, we investigate a related, but orthogonal question: given a fixed FLOP budget, what are the best datasets, models, and (self-)supervised training methods for obtaining high accuracy on representative visual tasks? Given the availability of large datasets, this setting is often more relevant for both academic and industry labs alike. We examine five large-scale datasets (JFT-300M, ALIGN, ImageNet-1K, ImageNet-21K, and COCO) and six pre-training methods (CLIP, DINO, SimCLR, BYOL, Masked Autoencoding, and supervised). In a like-for-like fashion, we characterize their FLOP and CO_2 footprints, relative to their accuracy when transferred to a canonical image segmentation task. Our analysis reveals strong disparities in the computational efficiency of pre-training methods and their dependence on dataset quality. In particular, our results call into question the commonly-held assumption that self-supervised methods inherently scale to large, uncurated data. We therefore advocate for (1) paying closer attention to dataset curation and (2) reporting of accuracies in context of the total computational cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2021

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

Pre-training models on large scale datasets, like ImageNet, is a standar...
research
04/14/2022

DeiT III: Revenge of the ViT

A Vision Transformer (ViT) is a simple neural architecture amenable to s...
research
11/20/2020

Efficient Conditional Pre-training for Transfer Learning

Almost all the state-of-the-art neural networks for computer vision task...
research
06/11/2020

Rethinking Pre-training and Self-training

Pre-training is a dominant paradigm in computer vision. For example, sup...
research
06/11/2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

Self-supervised learning (SSL) has led to great strides in speech proces...
research
12/25/2020

Self-supervised Pre-training with Hard Examples Improves Visual Representations

Self-supervised pre-training (SSP) employs random image transformations ...
research
04/14/2022

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Self-supervised pre-training methods have brought remarkable breakthroug...

Please sign up or login with your details

Forgot password? Click here to reset