Learning Future Object Prediction with a Spatiotemporal Detection Transformer

04/21/2022
by   Adam Tonderski, et al.
3

We explore future object prediction – a challenging problem where all objects visible in a future video frame are to be predicted. We propose to tackle this problem end-to-end by training a detection transformer to directly output future objects. In order to make accurate predictions about the future, it is necessary to capture the dynamics in the scene, both of other objects and of the ego-camera. We extend existing detection transformers in two ways to capture the scene dynamics. First, we experiment with three different mechanisms that enable the model to spatiotemporally process multiple frames. Second, we feed ego-motion information to the model via cross-attention. We show that both of these cues substantially improve future object prediction performance. Our final approach learns to capture the dynamics and make predictions on par with an oracle for 100 ms prediction horizons, and outperform baselines for longer prediction horizons.

READ FULL TEXT

page 2

page 3

page 14

research
06/16/2021

Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure

Our goal in this work is to generate realistic videos given just one ini...
research
07/20/2021

Generative Video Transformer: Can Objects be the Words?

Transformers have been successful for many natural language processing t...
research
08/13/2020

End-to-end Contextual Perception and Prediction with Interaction Transformer

In this paper, we tackle the problem of detecting objects in 3D and fore...
research
10/12/2021

Fourier-based Video Prediction through Relational Object Motion

The ability to predict future outcomes conditioned on observed video fra...
research
01/17/2019

Disentangling Video with Independent Prediction

We propose an unsupervised variational model for disentangling video int...
research
09/07/2018

Neural Allocentric Intuitive Physics Prediction from Real Videos

Humans are able to make rich predictions about the future dynamics of ph...
research
11/12/2019

Experience-Embedded Visual Foresight

Visual foresight gives an agent a window into the future, which it can u...

Please sign up or login with your details

Forgot password? Click here to reset