On the Pros and Cons of Momentum Encoder in Self-Supervised Visual Representation Learning

08/11/2022
by   Trung Pham, et al.
0

Exponential Moving Average (EMA or momentum) is widely used in modern self-supervised learning (SSL) approaches, such as MoCo, for enhancing performance. We demonstrate that such momentum can also be plugged into momentum-free SSL frameworks, such as SimCLR, for a performance boost. Despite its wide use as a fundamental component in modern SSL frameworks, the benefit caused by momentum is not well understood. We find that its success can be at least partly attributed to the stability effect. In the first attempt, we analyze how EMA affects each part of the encoder and reveal that the portion near the encoder's input plays an insignificant role while the latter parts have much more influence. By monitoring the gradient of the overall loss with respect to the output of each block in the encoder, we observe that the final layers tend to fluctuate much more than other layers during backpropagation, i.e. less stability. Interestingly, we show that using EMA to the final part of the SSL encoder, i.e. projector, instead of the whole deep network encoder can give comparable or preferable performance. Our proposed projector-only momentum helps maintain the benefit of EMA but avoids the double forward computation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2022

Self-Supervised Visual Representation Learning via Residual Momentum

Self-supervised learning (SSL) approaches have shown promising capabilit...
research
09/21/2023

A Study of Forward-Forward Algorithm for Self-Supervised Learning

Self-supervised representation learning has seen remarkable progress in ...
research
06/07/2022

Extending Momentum Contrast with Cross Similarity Consistency Regularization

Contrastive self-supervised representation learning methods maximize the...
research
12/09/2021

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

Self-supervised learning has shown its great potential to extract powerf...
research
10/20/2022

SSiT: Saliency-guided Self-supervised Image Transformer for Diabetic Retinopathy Grading

Self-supervised learning (SSL) has been widely applied to learn image re...
research
07/02/2023

Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers

Optimizer is an essential component for the success of deep learning, wh...
research
03/09/2021

SimTriplet: Simple Triplet Representation Learning with a Single GPU

Contrastive learning is a key technique of modern self-supervised learni...

Please sign up or login with your details

Forgot password? Click here to reset