Energy-Inspired Self-Supervised Pretraining for Vision Models

02/02/2023
by   Ze Wang, et al.
0

Motivated by the fact that forward and backward passes of a deep network naturally form symmetric mappings between input and output representations, we introduce a simple yet effective self-supervised vision model pretraining framework inspired by energy-based models (EBMs). In the proposed framework, we model energy estimation and data restoration as the forward and backward passes of a single network without any auxiliary components, e.g., an extra decoder. For the forward pass, we fit a network to an energy function that assigns low energy scores to samples that belong to an unlabeled dataset, and high energy otherwise. For the backward pass, we restore data from corrupted versions iteratively using gradient-based optimization along the direction of energy minimization. In this way, we naturally fold the encoder-decoder architecture widely used in masked image modeling into the forward and backward passes of a single vision model. Thus, our framework now accepts a wide range of pretext tasks with different data corruption methods, and permits models to be pretrained from masked image modeling, patch sorting, and image restoration, including super-resolution, denoising, and colorization. We support our findings with extensive experiments, and show the proposed method delivers comparable and even better performance with remarkably fewer epochs of training compared to the state-of-the-art self-supervised vision model pretraining methods. Our findings shed light on further exploring self-supervised vision model pretraining and pretext tasks beyond masked image modeling.

READ FULL TEXT

page 3

page 8

page 15

page 17

research
03/23/2021

Self-Supervised Pretraining Improves Self-Supervised Pretraining

While self-supervised pretraining has proven beneficial for many compute...
research
06/01/2022

MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

In this paper, we present a model pretraining technique, named MaskOCR, ...
research
12/20/2022

Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

Event cameras offer the capacity to asynchronously capture brightness ch...
research
07/06/2023

Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays

This work presents the first applications of self-supervised learning ap...
research
11/03/2022

Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR

This paper proposes a novel technique to obtain better downstream ASR pe...
research
10/23/2022

Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future

In this paper, we review adversarial pretraining of self-supervised deep...
research
04/04/2023

EGC: Image Generation and Classification via a Single Energy-Based Model

Learning image classification and image generation using the same set of...

Please sign up or login with your details

Forgot password? Click here to reset