Point-to-Point Video Generation

04/05/2019
by   Tsun-Hsuan Wang, et al.
0

While image manipulation achieves tremendous breakthroughs (e.g., generating realistic faces) in recent years, video generation is much less explored and harder to control, which limits its applications in the real world. For instance, video editing requires temporal coherence across multiple clips and thus poses both start and end constraints within a video sequence. We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames. The task is challenging since the model not only generates a smooth transition of frames, but also plans ahead to ensure that the generated end-frame conforms to the targeted end-frame for videos of various length. We propose to maximize the modified variational lower bound of conditional data likelihood under a skip-frame training strategy. Our model can generate sequences such that their end-frame is consistent with the targeted end-frame without loss of quality and diversity. Extensive experiments are conducted on Stochastic Moving MNIST, Weizmann Human Action, and Human3.6M to evaluate the effectiveness of the proposed method. We demonstrate our method under a series of scenarios (e.g., dynamic length generation) and the qualitative results showcase the potential and merits of point-to-point generation. For project page, see https://zswang666.github.io/P2PVG-Project-Page/

READ FULL TEXT

page 1

page 4

page 6

page 7

page 12

page 14

page 15

page 16

research
05/23/2023

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

This paper presents a controllable text-to-video (T2V) diffusion model, ...
research
12/14/2022

Towards Smooth Video Composition

Video generation requires synthesizing consistent and persistent frames ...
research
05/21/2023

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

We present an end-to-end diffusion-based method for editing videos with ...
research
05/24/2019

From Here to There: Video Inbetweening Using Direct 3D Convolutions

We consider the problem of generating plausible and diverse video sequen...
research
07/09/2021

Diverse Video Generation using a Gaussian Process Trigger

Generating future frames given a few context (or past) frames is a chall...
research
11/23/2022

Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation

Generating a video given the first several static frames is challenging ...
research
04/28/2022

Streaming Multiscale Deep Equilibrium Models

We present StreamDEQ, a method that infers frame-wise representations on...

Please sign up or login with your details

Forgot password? Click here to reset