Fast Training of Diffusion Models with Masked Transformers

06/15/2023
by   Hongkai Zheng, et al.
14

We propose an efficient approach to train large diffusion models with masked transformers. While masked transformers have been extensively explored for representation learning, their application to generative learning is less explored in the vision domain. Our work is the first to exploit masked training to reduce the training cost of diffusion models significantly. Specifically, we randomly mask out a high proportion (e.g., 50%) of patches in diffused input images during training. For masked training, we introduce an asymmetric encoder-decoder architecture consisting of a transformer encoder that operates only on unmasked patches and a lightweight transformer decoder on full patches. To promote a long-range understanding of full patches, we add an auxiliary task of reconstructing masked patches to the denoising score matching objective that learns the score of unmasked patches. Experiments on ImageNet-256×256 show that our approach achieves the same performance as the state-of-the-art Diffusion Transformer (DiT) model, using only 31% of its original training time. Thus, our method allows for efficient training of diffusion models without sacrificing the generative performance.

READ FULL TEXT

page 2

page 3

page 15

page 16

page 17

research
12/19/2022

Scalable Diffusion Models with Transformers

We explore a new class of diffusion models based on the transformer arch...
research
12/28/2022

Exploring Vision Transformers as Diffusion Learners

Score-based diffusion models have captured widespread attention and fund...
research
02/17/2022

Graph Masked Autoencoder

Transformers have achieved state-of-the-art performance in learning grap...
research
11/02/2022

The Lottery Ticket Hypothesis for Vision Transformers

The conventional lottery ticket hypothesis (LTH) claims that there exist...
research
11/09/2022

Training a Vision Transformer from scratch in less than 24 hours with 1 GPU

Transformers have become central to recent advances in computer vision. ...
research
01/25/2022

DocEnTr: An End-to-End Document Image Enhancement Transformer

Document images can be affected by many degradation scenarios, which cau...
research
04/25/2023

Application of Transformers for Nonlinear Channel Compensation in Optical Systems

In this paper, we introduce a new nonlinear channel equalization method ...

Please sign up or login with your details

Forgot password? Click here to reset