DeepAI
Log In Sign Up

Imagen Video: High Definition Video Generation with Diffusion Models

10/05/2022
by   Jonathan Ho, et al.
16

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding. See https://imagen.research.google/video/ for samples.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 12

page 13

page 15

04/07/2022

Video Diffusion Models

Generating temporally coherent high fidelity video is an important miles...
10/24/2022

High-Resolution Image Editing via Multi-Stage Blended Diffusion

Diffusion models have shown great results in image generation and in ima...
03/28/2022

Reference-based Video Super-Resolution Using Multi-Camera Video Triplets

We propose the first reference-based video super-resolution (RefVSR) app...
06/06/2019

Scaling Autoregressive Video Models

Due to the statistical complexity of video, the high degree of inherent ...
06/01/2022

Cascaded Video Generation for Videos In-the-Wild

Videos can be created by first outlining a global view of the scene and ...
11/20/2022

MagicVideo: Efficient Video Generation With Latent Diffusion Models

We present an efficient text-to-video generation framework based on late...
06/15/2022

Diffusion Models for Video Prediction and Infilling

To predict and anticipate future outcomes or reason about missing inform...