LEO: Generative Latent Image Animator for Human Video Synthesis

05/06/2023
by   Yaohui Wang, et al.
5

Spatio-temporal coherency is a major challenge in synthesizing high quality videos, particularly in synthesizing human videos that contain rich global and local deformations. To resolve this challenge, previous approaches have resorted to different features in the generation process aimed at representing appearance and motion. However, in the absence of strict mechanisms to guarantee such disentanglement, a separation of motion from appearance has remained challenging, resulting in spatial distortions and temporal jittering that break the spatio-temporal coherency. Motivated by this, we here propose LEO, a novel framework for human video synthesis, placing emphasis on spatio-temporal coherency. Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance. We implement this idea via a flow-based image animator and a Latent Motion Diffusion Model (LMDM). The former bridges a space of motion codes with the space of flow maps, and synthesizes video frames in a warp-and-inpaint manner. LMDM learns to capture motion prior in the training data by synthesizing sequences of motion codes. Extensive quantitative and qualitative analysis suggests that LEO significantly improves coherent synthesis of human videos over previous methods on the datasets TaichiHD, FaceForensics and CelebV-HQ. In addition, the effective disentanglement of appearance and motion in LEO allows for two additional tasks, namely infinite-length human video synthesis, as well as content-preserving video editing.

READ FULL TEXT

page 1

page 4

page 5

page 6

page 8

page 11

research
12/11/2019

G^3AN: This video does not exist. Disentangling motion and appearance for video generation

Creating realistic human videos introduces the challenge of being able t...
research
05/06/2023

Multi-object Video Generation from Single Frame Layouts

In this paper, we study video synthesis with emphasis on simplifying the...
research
11/10/2021

Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

Synthesizing dynamic appearances of humans in motion plays a central rol...
research
04/08/2019

Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions

This paper presents a new task, the grounding of spatio-temporal identif...
research
01/04/2020

Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

We introduce a new video synthesis task: synthesizing time lapse videos ...
research
05/20/2022

Automatic Generation of Synthetic Colonoscopy Videos for Domain Randomization

An increasing number of colonoscopic guidance and assistance systems rel...
research
02/10/2015

Video Primal Sketch: A Unified Middle-Level Representation for Video

This paper presents a middle-level video representation named Video Prim...

Please sign up or login with your details

Forgot password? Click here to reset