Object-Centric Video Prediction via Decoupling of Object Dynamics and Interactions

02/23/2023
by   Angel Villar-Corrales, et al.
0

We propose a novel framework for the task of object-centric video prediction, i.e., extracting the compositional structure of a video sequence, as well as modeling objects dynamics and interactions from visual observations in order to predict the future object states, from which we can then generate subsequent video frames. With the goal of learning meaningful spatio-temporal object representations and accurately forecasting object states, we propose two novel object-centric video predictor (OCVP) transformer modules, which decouple the processing of temporal dynamics and object interactions, thus presenting an improved prediction performance. In our experiments, we show how our object-centric prediction framework utilizing our OCVP predictors outperforms object-agnostic video prediction models on two different datasets, while maintaining consistent and accurate object representations.

READ FULL TEXT

page 1

page 2

page 4

research
10/13/2021

Object-Region Video Transformers

Evidence from cognitive psychology suggests that understanding spatio-te...
research
07/20/2022

Is an Object-Centric Video Representation Beneficial for Transfer?

The objective of this work is to learn an object-centric video represent...
research
05/06/2021

Object-centric Video Prediction without Annotation

In order to interact with the world, agents must be able to predict the ...
research
04/09/2021

GATSBI: Generative Agent-centric Spatio-temporal Object Interaction

We present GATSBI, a generative model that can transform a sequence of r...
research
07/02/2021

Visual Relationship Forecasting in Videos

Real-world scenarios often require the anticipation of object interactio...
research
03/17/2022

Video Prediction at Multiple Scales with Hierarchical Recurrent Networks

Autonomous systems not only need to understand their current environment...
research
01/21/2023

Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction

When perceiving the world from multiple viewpoints, humans have the abil...

Please sign up or login with your details

Forgot password? Click here to reset