Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

08/08/2022
by   Xiangwen Kong, et al.
6

Recently, Masked Image Modeling (MIM) achieves great success in self-supervised visual recognition. However, as a reconstruction-based framework, it is still an open question to understand how MIM works, since MIM appears very different from previous well-studied siamese approaches such as contrastive learning. In this paper, we propose a new viewpoint: MIM implicitly learns occlusion-invariant features, which is analogous to other siamese methods while the latter learns other invariance. By relaxing MIM formulation into an equivalent siamese form, MIM methods can be interpreted in a unified framework with conventional methods, among which only a) data transformations, i.e. what invariance to learn, and b) similarity measurements are different. Furthermore, taking MAE (He et al.) as a representative example of MIM, we empirically find the success of MIM models relates a little to the choice of similarity functions, but the learned occlusion invariant feature introduced by masked image – it turns out to be a favored initialization for vision transformers, even though the learned feature could be less semantic. We hope our findings could inspire researchers to develop more powerful self-supervised methods in computer vision community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

Visualizing and Understanding Self-Supervised Vision Learning

Self-Supervised vision learning has revolutionized deep learning, becomi...
research
08/09/2017

Transitive Invariance for Self-supervised Visual Representation Learning

Learning visual representations with self-supervised learning has become...
research
12/09/2021

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

Self-supervised learning has shown its great potential to extract powerf...
research
02/11/2023

Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image Segmentation

Self-supervised learning (SSL) has recently achieved promising performan...
research
02/09/2023

Deep Intra-Image Contrastive Learning for Weakly Supervised One-Step Person Search

Weakly supervised person search aims to perform joint pedestrian detecti...
research
05/30/2022

GMML is All you Need

Vision transformers have generated significant interest in the computer ...
research
09/15/2021

Self-learn to Explain Siamese Networks Robustly

Learning to compare two objects are essential in applications, such as d...

Please sign up or login with your details

Forgot password? Click here to reset