Understanding Object Dynamics for Interactive Image-to-Video Synthesis

06/21/2021
by   Andreas Blattmann, et al.
2

What would be the effect of locally poking a static scene? We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level. Training requires only videos of moving objects but no information of the underlying manipulation of the physical scene. Our generative model learns to infer natural object dynamics as a response to user interaction and learns about the interrelations between different object body regions. Given a static image of an object and a local poking of a pixel, the approach then predicts how the object would deform over time. In contrast to existing work on video prediction, we do not synthesize arbitrary realistic videos but enable local interactive control of the deformation. Our model is not restricted to particular object categories and can transfer dynamics onto novel unseen object instances. Extensive experiments on diverse objects demonstrate the effectiveness of our approach compared to common video prediction frameworks. Project page is available at https://bit.ly/3cxfA2L .

READ FULL TEXT

page 6

page 15

page 16

page 17

page 18

page 19

page 20

page 21

research
07/06/2021

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

How would a static scene react to a local poke? What are the effects on ...
research
05/10/2021

Stochastic Image-to-Video Synthesis using cINNs

Video understanding calls for a model to learn the characteristic interp...
research
03/08/2021

Behavior-Driven Synthesis of Human Dynamics

Generating and representing human behavior are of major importance for v...
research
08/11/2018

Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos

We propose an end-to-end learning framework for segmenting generic objec...
research
08/22/2019

Compositional Video Prediction

We present an approach for pixel-level future prediction given an input ...
research
08/21/2020

DeepLandscape: Adversarial Modeling of Landscape Video

We build a new model of landscape videos that can be trained on a mixtur...
research
02/16/2023

3D-aware Conditional Image Synthesis

We propose pix2pix3D, a 3D-aware conditional generative model for contro...

Please sign up or login with your details

Forgot password? Click here to reset