Conditional Video Generation Using Action-Appearance Captions

12/04/2018
by   Shohei Yamamoto, et al.
0

The field of automatic video generation has received a boost thanks to the recent Generative Adversarial Networks (GANs). However, most existing methods cannot control the contents of the generated video using a text caption, losing their usefulness to a large extent. This particularly affects human videos due to their great variety of actions and appearances. This paper presents Conditional Flow and Texture GAN (CFT-GAN), a GAN-based video generation method from action-appearance captions. We propose a novel way of generating video by encoding a caption (e.g., `a man in blue jeans is playing golf') in a two-stage generation pipeline. Our CFT-GAN uses such caption to generate an optical flow (action) and a texture (appearance) for each frame. As a result, the output video reflects the content specified in the caption in a plausible way. Moreover, to train our method, we constructed a new dataset for human video generation with captions. We evaluated the proposed method qualitatively and quantitatively via an ablation study and a user study. The results demonstrate that CFT-GAN is able to successfully generate videos containing the action and appearances indicated in the captions.

READ FULL TEXT

page 8

page 9

research
11/27/2017

Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture

Learning to represent and generate videos from unlabeled data is a very ...
research
03/10/2020

Video Caption Dataset for Describing Human Actions in Japanese

In recent years, automatic video caption generation has attracted consid...
research
04/23/2018

To Create What You Tell: Generating Videos from Captions

We are creating multimedia contents everyday and everywhere. While autom...
research
11/25/2019

GAC-GAN: A General Method for Appearance-Controllable Human Video Motion Transfer

Human video motion transfer has a wide range of applications in multimed...
research
10/16/2019

Label-Conditioned Next-Frame Video Generation with Neural Flows

Recent state-of-the-art video generation systems employ Generative Adver...
research
12/03/2018

A Two-Stream Variational Adversarial Network for Video Generation

Video generation is an inherently challenging task, as it requires the m...
research
01/28/2021

Playable Video Generation

This paper introduces the unsupervised learning problem of playable vide...

Please sign up or login with your details

Forgot password? Click here to reset