Masked Image Modeling with Local Multi-Scale Reconstruction

03/09/2023
by   Haoqing Wang, et al.
0

Masked Image Modeling (MIM) achieves outstanding success in self-supervised representation learning. Unfortunately, MIM models typically have huge computational burden and slow learning process, which is an inevitable obstacle for their industrial applications. Although the lower layers play the key role in MIM, existing MIM models conduct reconstruction task only at the top layer of encoder. The lower layers are not explicitly guided and the interaction among their patches is only used for calculating new activations. Considering the reconstruction task requires non-trivial inter-patch interactions to reason target signals, we apply it to multiple local layers including lower and upper layers. Further, since the multiple layers expect to learn the information of different scales, we design local multi-scale reconstruction, where the lower and upper layers reconstruct fine-scale and coarse-scale supervision signals respectively. This design not only accelerates the representation learning process by explicitly guiding multiple layers, but also facilitates multi-scale semantical understanding to the input. Extensive experiments show that with significantly less pre-training burden, our model achieves comparable or better performance on classification, detection and segmentation tasks than existing MIM models.

READ FULL TEXT

page 3

page 8

research
05/28/2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

Masked Autoencoders (MAE) have shown great potentials in self-supervised...
research
12/07/2022

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

Self-supervised learning (SSL) models have achieved considerable improve...
research
08/30/2022

Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond

While self-supervised learning has been shown to benefit a number of vis...
research
03/29/2022

mc-BEiT: Multi-choice Discretization for Image BERT Pre-training

Image BERT pre-training with masked image modeling (MIM) becomes a popul...
research
01/12/2022

Local2Global: A distributed approach for scaling representation learning on graphs

We propose a decentralised "local2global"' approach to graph representat...
research
02/02/2023

SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling

Time series analysis is widely used in extensive areas. Recently, to red...
research
02/16/2020

Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells

Unsupervised text encoding models have recently fueled substantial progr...

Please sign up or login with your details

Forgot password? Click here to reset