MagicVideo: Efficient Video Generation With Latent Diffusion Models

11/20/2022
by   Daquan Zhou, et al.
0

We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo. Given a text description, MagicVideo can generate photo-realistic video clips with high relevance to the text content. With the proposed efficient latent 3D U-Net design, MagicVideo can generate video clips with 256x256 spatial resolution on a single GPU card, which is 64x faster than the recent video diffusion model (VDM). Unlike previous works that train video generation from scratch in the RGB space, we propose to generate video clips in a low-dimensional latent space. We further utilize all the convolution operator weights of pre-trained text-to-image generative U-Net models for faster training. To achieve this, we introduce two new designs to adapt the U-Net decoder to video data: a framewise lightweight adaptor for the image-to-video distribution adjustment and a directed temporal attention module to capture frame temporal dependencies. The whole generation process is within the low-dimension latent space of a pre-trained variation auto-encoder. We demonstrate that MagicVideo can generate both realistic video content and imaginary content in a photo-realistic style with a trade-off in terms of quality and computational cost. Refer to https://magicvideo.github.io/# for more examples.

READ FULL TEXT

page 2

page 4

page 8

page 9

research
09/01/2023

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

In this paper, we present VideoGen, a text-to-video generation approach,...
research
03/24/2023

Conditional Image-to-Video Generation with Latent Flow Diffusion Models

Conditional image-to-video (cI2V) generation aims to synthesize a new pl...
research
11/21/2021

Modelling Direct Messaging Networks with Multiple Recipients for Cyber Deception

Cyber deception is emerging as a promising approach to defending network...
research
05/23/2023

Neural Image Re-Exposure

The shutter strategy applied to the photo-shooting process has a signifi...
research
04/20/2022

Sound-Guided Semantic Video Generation

The recent success in StyleGAN demonstrates that pre-trained StyleGAN la...
research
07/15/2021

StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN

Generative adversarial models (GANs) continue to produce advances in ter...
research
08/18/2023

SimDA: Simple Diffusion Adapter for Efficient Video Generation

The recent wave of AI-generated content has witnessed the great developm...

Please sign up or login with your details

Forgot password? Click here to reset