Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders

10/05/2022
by   Youngwan Lee, et al.
24

Masked image modeling (MIM) has become a popular strategy for self-supervised learning (SSL) of visual representations with Vision Transformers. A representative MIM model, the masked auto-encoder (MAE), randomly masks a subset of image patches and reconstructs the masked patches given the unmasked patches. Concurrently, many recent works in self-supervised learning utilize the student/teacher paradigm which provides the student with an additional target based on the output of a teacher composed of an exponential moving average (EMA) of previous students. Although common, relatively little is known about the dynamics of the interaction between the student and teacher. Through analysis on a simple linear model, we find that the teacher conditionally removes previous gradient directions based on feature similarities which effectively acts as a conditional momentum regularizer. From this analysis, we present a simple SSL method, the Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE. We find that RC-MAE converges faster and requires less memory usage than state-of-the-art self-distillation methods during pre-training, which may provide a way to enhance the practicality of prohibitively expensive self-supervised learning of Vision Transformer models. Additionally, we show that RC-MAE achieves more robustness and better performance compared to MAE on downstream tasks such as ImageNet-1K classification, object detection, and instance segmentation.

READ FULL TEXT
research
10/28/2020

CompRess: Self-Supervised Learning by Compressing Representations

Self-supervised learning aims to learn good representations with unlabel...
research
09/19/2022

Attentive Symmetric Autoencoder for Brain MRI Segmentation

Self-supervised learning methods based on image patch reconstruction hav...
research
03/21/2023

Self-supervised learning of a tailored Convolutional Auto Encoder for histopathological prostate grading

According to GLOBOCAN 2020, prostate cancer is the second most common ca...
research
05/28/2023

LowDINO – A Low Parameter Self Supervised Learning Model

This research aims to explore the possibility of designing a neural netw...
research
10/26/2020

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

We study how to learn a policy with compositional generalizability. We p...
research
11/22/2022

YZR-net : Self-supervised Hidden representations Invariant to Transformations for profanity detection

On current e-learning platforms, live classes are an important tool that...
research
04/26/2022

ATST: Audio Representation Learning with Teacher-Student Transformer

Self-supervised learning (SSL) learns knowledge from a large amount of u...

Please sign up or login with your details

Forgot password? Click here to reset