Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis

10/16/2019
by   Yuki Endo, et al.
14

Automatic generation of a high-quality video from a single image remains a challenging task despite the recent advances in deep generative models. This paper proposes a method that can create a high-resolution, long-term animation using convolutional neural networks (CNNs) from a single landscape image where we mainly focus on skies and waters. Our key observation is that the motion (e.g., moving clouds) and appearance (e.g., time-varying colors in the sky) in natural scenes have different time scales. We thus learn them separately and predict them with decoupled control while handling future uncertainty in both predictions by introducing latent codes. Unlike previous methods that infer output frames directly, our CNNs predict spatially-smooth intermediate data, i.e., for motion, flow fields for warping, and for appearance, color transfer maps, via self-supervised learning, i.e., without explicitly-provided ground truth. These intermediate data are applied not to each previous output frame, but to the input image only once for each output frame. This design is crucial to alleviate error accumulation in long-term predictions, which is the essential problem in previous recurrent approaches. The output frames can be looped like cinemagraph, and also be controlled directly by specifying latent codes or indirectly via visual annotations. We demonstrate the effectiveness of our method through comparisons with the state-of-the-arts on video prediction as well as appearance manipulation.

READ FULL TEXT

page 8

page 9

page 10

page 11

page 12

page 13

page 17

page 18

research
12/05/2018

Video Synthesis from a Single Image and Motion Stroke

In this paper, we propose a new method to automatically generate a video...
research
03/01/2019

Self-supervised Learning for Single View Depth and Surface Normal Estimation

In this work we present a self-supervised learning framework to simultan...
research
12/02/2018

Disentangling Propagation and Generation for Video Prediction

Learning to predict future video frames is a challenging task. Recent ap...
research
03/03/2019

Unsupervised Bi-directional Flow-based Video Generation from one Snapshot

Imagining multiple consecutive frames given one single snapshot is chall...
research
11/26/2019

Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns

Dynamic patterns are characterized by complex spatial and motion pattern...
research
03/19/2022

Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation

For video frame interpolation (VFI), existing deep-learning-based approa...
research
04/04/2023

HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering

We consider the challenging task of training models for image-to-video d...

Please sign up or login with your details

Forgot password? Click here to reset