Diffusion Models as Masked Autoencoders

04/06/2023
by   Chen Wei, et al.
0

There has been a longstanding belief that generation can facilitate a true understanding of visual data. In line with this, we revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE). Our approach is capable of (i) serving as a strong initialization for downstream recognition tasks, (ii) conducting high-quality image inpainting, and (iii) being effortlessly extended to video where it produces state-of-the-art classification accuracy. We further perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.

READ FULL TEXT

page 1

page 3

page 8

page 11

page 12

page 13

page 14

page 15

research
03/17/2023

Denoising Diffusion Autoencoders are Unified Self-supervised Learners

Inspired by recent advances in diffusion models, which are reminiscent o...
research
12/22/2022

GENIE: Large Scale Pre-training for Text Generation with Diffusion Model

In this paper, we propose a large-scale language pre-training for text G...
research
07/26/2023

Pre-Training with Diffusion models for Dental Radiography segmentation

Medical radiography segmentation, and specifically dental radiography, i...
research
10/30/2022

A simple, efficient and scalable contrastive masked autoencoder for learning visual representations

We introduce CAN, a simple, efficient and scalable method for self-super...
research
06/02/2023

Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats

Invisible watermarks safeguard images' copyrights by embedding hidden me...
research
02/12/2015

Convergence of gradient based pre-training in Denoising autoencoders

The success of deep architectures is at least in part attributed to the ...
research
06/15/2022

Diffusion Models for Video Prediction and Infilling

To predict and anticipate future outcomes or reason about missing inform...

Please sign up or login with your details

Forgot password? Click here to reset