Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure

06/16/2021
by   Paul Henderson, et al.
0

Our goal in this work is to generate realistic videos given just one initial frame as input. Existing unsupervised approaches to this task do not consider the fact that a video typically shows a 3D environment, and that this should remain coherent from frame to frame even as the camera and objects move. We address this by developing a model that first estimates the latent 3D structure of the scene, including the segmentation of any moving objects. It then predicts future frames by simulating the object and camera dynamics, and rendering the resulting views. Importantly, it is trained end-to-end using only the unsupervised objective of predicting future frames, without any 3D information nor segmentation annotations. Experiments on two challenging datasets of natural videos show that our model can estimate 3D structure and motion segmentation from a single frame, and hence generate plausible and varied predictions.

READ FULL TEXT

page 3

page 4

page 7

page 10

page 11

page 12

page 15

page 16

research
08/19/2022

Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images

The challenge of graphically rendering high frame-rate videos on low com...
research
02/21/2018

Stochastic Video Generation with a Learned Prior

Generating video frames that accurately predict future world states is c...
research
04/21/2022

Learning Future Object Prediction with a Spatiotemporal Detection Transformer

We explore future object prediction – a challenging problem where all ob...
research
03/15/2019

Inserting Videos into Videos

In this paper, we introduce a new problem of manipulating a given video ...
research
05/20/2017

Forecasting Hands and Objects in Future Frames

This paper presents an approach to forecast future presence and location...
research
06/05/2018

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

We present Sequential Attend, Infer, Repeat (SQAIR), an interpretable de...
research
07/25/2020

Towards 3D Visualization of Video from Frames

We explain theoretically how to reconstruct the 3D scene from successive...

Please sign up or login with your details

Forgot password? Click here to reset