A simple, efficient and scalable contrastive masked autoencoder for learning visual representations

10/30/2022
by   Shlok Mishra, et al.
0

We introduce CAN, a simple, efficient and scalable method for self-supervised learning of visual representations. Our framework is a minimal and conceptually clean synthesis of (C) contrastive learning, (A) masked autoencoders, and (N) the noise prediction approach used in diffusion models. The learning mechanisms are complementary to one another: contrastive learning shapes the embedding space across a batch of image samples; masked autoencoders focus on reconstruction of the low-frequency spatial correlations in a single image sample; and noise prediction encourages the reconstruction of the high-frequency components of an image. The combined approach results in a robust, scalable and simple-to-implement algorithm. The training process is symmetric, with 50 considerable efficiency improvement over prior contrastive learning methods. Extensive empirical studies demonstrate that CAN achieves strong downstream performance under both linear and finetuning evaluations on transfer learning and robustness tasks. CAN outperforms MAE and SimCLR when pre-training on ImageNet, but is especially useful for pre-training on larger uncurated datasets such as JFT-300M: for linear probe on ImageNet, CAN achieves 75.4 compared to 73.4 ImageNet of our ViT-L model is 86.1 for MAE. The overall FLOPs load of SimCLR is 70 models.

READ FULL TEXT

page 4

page 8

page 14

page 15

page 16

page 17

research
03/17/2023

Denoising Diffusion Autoencoders are Unified Self-supervised Learners

Inspired by recent advances in diffusion models, which are reminiscent o...
research
11/19/2020

Heterogeneous Contrastive Learning: Encoding Spatial Information for Compact Visual Representations

Contrastive learning has achieved great success in self-supervised visua...
research
02/22/2023

Steerable Equivariant Representation Learning

Pre-trained deep image representations are useful for post-training task...
research
11/11/2022

Masked Contrastive Representation Learning

Masked image modelling (e.g., Masked AutoEncoder) and contrastive learni...
research
04/06/2023

Diffusion Models as Masked Autoencoders

There has been a longstanding belief that generation can facilitate a tr...
research
06/15/2022

Masked Frequency Modeling for Self-Supervised Visual Pre-Training

We present Masked Frequency Modeling (MFM), a unified frequency-domain-b...
research
10/13/2021

Decoupled Contrastive Learning

Contrastive learning (CL) is one of the most successful paradigms for se...

Please sign up or login with your details

Forgot password? Click here to reset