Disentangling Content and Motion for Text-Based Neural Video Manipulation

11/05/2022
by   Levent Karacan, et al.
0

Giving machines the ability to imagine possible new objects or scenes from linguistic descriptions and produce their realistic renderings is arguably one of the most challenging problems in computer vision. Recent advances in deep generative models have led to new approaches that give promising results towards this goal. In this paper, we introduce a new method called DiCoMoGAN for manipulating videos with natural language, aiming to perform local and semantic edits on a video clip to alter the appearances of an object of interest. Our GAN architecture allows for better utilization of multiple observations by disentangling content and motion to enable controllable semantic edits. To this end, we introduce two tightly coupled networks: (i) a representation network for constructing a concise understanding of motion dynamics and temporally invariant content, and (ii) a translation network that exploits the extracted latent content representation to actuate the manipulation according to the target description. Our qualitative and quantitative evaluations demonstrate that DiCoMoGAN significantly outperforms existing frame-based methods, producing temporally coherent and semantically more meaningful results.

READ FULL TEXT

page 9

page 10

page 19

page 25

page 27

page 28

page 29

page 30

research
11/21/2021

Video Content Swapping Using GAN

Video generation is an interesting problem in computer vision. It is qui...
research
02/26/2021

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis

Generating videos with content and motion variations is a challenging ta...
research
08/27/2018

Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions

We propose a novel attentive sequence to sequence translator (ASST) for ...
research
11/25/2022

Interactive Image Manipulation with Complex Text Instructions

Recently, text-guided image manipulation has received increasing attenti...
research
08/21/2018

Deep Video-Based Performance Cloning

We present a new video-based performance cloning technique. After traini...
research
03/21/2022

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

Image translation and manipulation have gain increasing attention along ...
research
01/08/2021

InMoDeGAN: Interpretable Motion Decomposition Generative Adversarial Network for Video Generation

In this work, we introduce an unconditional video generative model, InMo...

Please sign up or login with your details

Forgot password? Click here to reset