Conditional Image-to-Video Generation with Latent Flow Diffusion Models

03/24/2023
by   Haomiao Ni, et al.
0

Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., a person's face) and a condition (e.g., an action class label like smile). The key challenge of the cI2V task lies in the simultaneous generation of realistic spatial appearance and temporal dynamics corresponding to the given image and condition. In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image. Compared to previous direct-synthesis-based works, our proposed LFDM can better synthesize spatial details and temporal motion by fully utilizing the spatial content of the given image and warping it in the latent space according to the generated temporally-coherent flow. The training of LFDM consists of two separate stages: (1) an unsupervised learning stage to train a latent flow auto-encoder for spatial content generation, including a flow predictor to estimate latent flow between pairs of video frames, and (2) a conditional learning stage to train a 3D-UNet-based diffusion model (DM) for temporal latent flow generation. Unlike previous DMs operating in pixel space or latent feature space that couples spatial and temporal information, the DM in our LFDM only needs to learn a low-dimensional latent flow space for motion generation, thus being more computationally efficient. We conduct comprehensive experiments on multiple datasets, where LFDM consistently outperforms prior arts. Furthermore, we show that LFDM can be easily adapted to new domains by simply finetuning the image decoder. Our code is available at https://github.com/nihaomiao/CVPR23_LFDM.

READ FULL TEXT

page 1

page 4

page 6

page 8

page 13

research
11/20/2022

MagicVideo: Efficient Video Generation With Latent Diffusion Models

We present an efficient text-to-video generation framework based on late...
research
02/15/2023

Video Probabilistic Diffusion Models in Projected Latent Space

Despite the remarkable progress in deep generative models, synthesizing ...
research
07/17/2023

Flow Matching in Latent Space

Flow matching is a recent framework to train generative models that exhi...
research
04/03/2019

Conditional Adversarial Generative Flow for Controllable Image Synthesis

Flow-based generative models show great potential in image synthesis due...
research
04/04/2023

HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering

We consider the challenging task of training models for image-to-video d...
research
06/08/2023

Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning

Recent works have shown the potential of diffusion models in computer vi...
research
03/11/2019

Video Generation from Single Semantic Label Map

This paper proposes the novel task of video generation conditioned on a ...

Please sign up or login with your details

Forgot password? Click here to reset