Generating Long Videos of Dynamic Scenes

06/07/2022
by   Tim Brooks, et al.
12

We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence. A common failure case is for content to never change due to over-reliance on inductive biases to provide temporal consistency, such as a single latent code that dictates content for the entire video. On the other extreme, without long-term consistency, generated videos may morph unrealistically between different scenes. To address these limitations, we prioritize the time axis by redesigning the temporal latent representation and learning long-term consistency from data by training on longer videos. To this end, we leverage a two-phase training strategy, where we separately train using longer videos at a low resolution and shorter videos at a high resolution. To evaluate the capabilities of our model, we introduce two new benchmark datasets with explicit focus on long-term temporal dynamics.

READ FULL TEXT

page 2

page 4

page 5

page 6

page 9

page 14

page 15

page 16

research
10/05/2022

Temporally Consistent Video Transformer for Long-Term Video Prediction

Generating long, temporally consistent video remains an open challenge i...
research
01/04/2020

Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

We introduce a new video synthesis task: synthesizing time lapse videos ...
research
08/25/2022

Visualizing the Passage of Time with Video Temporal Pyramids

What can we learn about a scene by watching it for months or years? A vi...
research
09/01/2023

Long-Term Memorability On Advertisements

Marketers spend billions of dollars on advertisements but to what end? A...
research
05/11/2022

Scene Consistency Representation Learning for Video Scene Segmentation

A long-term video, such as a movie or TV show, is composed of various sc...
research
07/27/2023

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking

We introduce PointOdyssey, a large-scale synthetic dataset, and data gen...
research
01/30/2021

Video Reenactment as Inductive Bias for Content-Motion Disentanglement

We introduce a self-supervised motion-transfer VAE model to disentangle ...

Please sign up or login with your details

Forgot password? Click here to reset