A Good Image Generator Is What You Need for High-Resolution Video Synthesis

by   Yu Tian, et al.

Image and video synthesis are closely related areas aiming at generating content from noise. While rapid progress has been demonstrated in improving image-based models to handle large resolutions, high-quality renderings, and wide variations in image content, achieving comparable video generation results remains problematic. We present a framework that leverages contemporary image generators to render high-resolution videos. We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator. Not only does such a framework render high-resolution videos, but it also is an order of magnitude more computationally efficient. We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled. With such a representation, our framework allows for a broad range of applications, including content and motion manipulation. Furthermore, we introduce a new task, which we call cross-domain video synthesis, in which the image and motion generators are trained on disjoint datasets belonging to different domains. This allows for generating moving objects for which the desired video data is not available. Extensive experiments on various datasets demonstrate the advantages of our methods over existing video generation techniques. Code will be released at https://github.com/snap-research/MoCoGAN-HD.


page 7

page 9

page 17

page 18

page 19

page 20

page 21

page 23


StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

Videos show continuous events, yet most - if not all - video synthesis f...

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis

Generating videos with content and motion variations is a challenging ta...

HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator

Video prediction is an important yet challenging problem; burdened with ...

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

Most methods for conditional video synthesis use a single modality as th...

Playable Environments: Video Manipulation in Space and Time

We present Playable Environments - a new representation for interactive ...

Neighbor Correspondence Matching for Flow-based Video Frame Synthesis

Video frame synthesis, which consists of interpolation and extrapolation...

CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis

The style-based GAN (StyleGAN) architecture achieved state-of-the-art re...