Self-Supervised Learning based on Heat Equation

11/23/2022
by   Yinpeng Chen, et al.
0

This paper presents a new perspective of self-supervised learning based on extending heat equation into high dimensional feature space. In particular, we remove time dependence by steady-state condition, and extend the remaining 2D Laplacian from x–y isotropic to linear correlated. Furthermore, we simplify it by splitting x and y axes as two first-order linear differential equations. Such simplification explicitly models the spatial invariance along horizontal and vertical directions separately, supporting prediction across image blocks. This introduces a very simple masked image modeling (MIM) method, named QB-Heat. QB-Heat leaves a single block with size of quarter image unmasked and extrapolates other three masked quarters linearly. It brings MIM to CNNs without bells and whistles, and even works well for pre-training light-weight networks that are suitable for both image classification and object detection without fine-tuning. Compared with MoCo-v2 on pre-training a Mobile-Former with 5.8M parameters and 285M FLOPs, QB-Heat is on par in linear probing on ImageNet, but clearly outperforms in non-linear probing that adds a transformer block before linear classifier (65.6 detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7.9 and 4.5 AP respectively. This work provides an insightful hypothesis on the invariance within visual representation over different shapes and textures: the linear relationship between horizontal and vertical derivatives. The code will be publicly released.

READ FULL TEXT

page 1

page 2

page 4

page 16

page 17

research
05/25/2023

Image is First-order Norm+Linear Autoregressive

This paper reveals that every image can be understood as a first-order n...
research
06/12/2023

Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

The use of self-supervised pre-training has emerged as a promising appro...
research
05/28/2022

Object-wise Masked Autoencoders for Fast Pre-training

Self-supervised pre-training for images without labels has recently achi...
research
03/22/2023

Correlational Image Modeling for Self-Supervised Visual Pre-Training

We introduce Correlational Image Modeling (CIM), a novel and surprisingl...
research
03/30/2022

Exploring Plain Vision Transformer Backbones for Object Detection

We explore the plain, non-hierarchical Vision Transformer (ViT) as a bac...
research
09/15/2022

Visual Recognition with Deep Nearest Centroids

We devise deep nearest centroids (DNC), a conceptually elegant yet surpr...

Please sign up or login with your details

Forgot password? Click here to reset