Visual Representation Learning from Unlabeled Video using Contrastive Masked Autoencoders

03/21/2023
by   Jefferson Hernandez, et al.
0

Masked Autoencoders (MAEs) learn self-supervised representations by randomly masking input image patches and a reconstruction loss. Alternatively, contrastive learning self-supervised methods encourage two versions of the same input to have a similar representation, while pulling apart the representations for different inputs. We propose ViC-MAE, a general method that combines both MAE and contrastive learning by pooling the local feature representations learned under the MAE reconstruction objective and leveraging this global representation under a contrastive objective across video frames. We show that visual representations learned under ViC-MAE generalize well to both video classification and image classification tasks. Using a backbone ViT-B/16 network pre-trained on the Moments in Time (MiT) dataset, we obtain state-of-the-art transfer learning from video to images on Imagenet-1k by improving 1.58 Moreover, our method maintains a competitive transfer-learning performance of 81.50 addition, we show that despite its simplicity, ViC-MAE yields improved results compared to combining MAE pre-training with previously proposed contrastive objectives such as VicReg and SiamSiam.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2021

Mutual Contrastive Learning for Visual Representation Learning

We present a collaborative learning method called Mutual Contrastive Lea...
research
03/31/2022

Self-distillation Augmented Masked Autoencoders for Histopathological Image Classification

Self-supervised learning (SSL) has drawn increasing attention in patholo...
research
11/25/2020

ImCLR: Implicit Contrastive Learning for Image Classification

Contrastive learning is an effective method for learning visual represen...
research
11/18/2022

Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

Video summarization aims to select the most informative subset of frames...
research
10/14/2021

Inverse Problems Leveraging Pre-trained Contrastive Representations

We study a new family of inverse problems for recovering representations...
research
08/13/2021

GeoCLR: Georeference Contrastive Learning for Efficient Seafloor Image Interpretation

This paper describes Georeference Contrastive Learning of visual Represe...
research
11/30/2019

Probing the State of the Art: A Critical Look at Visual Representation Evaluation

Self-supervised research improved greatly over the past half decade, wit...

Please sign up or login with your details

Forgot password? Click here to reset