Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction

01/21/2023
by   Chengmin Gao, et al.
0

When perceiving the world from multiple viewpoints, humans have the ability to reason about the complete objects in a compositional manner even when the object is completely occluded from partial viewpoints. Meanwhile, humans can imagine the novel views after observing multiple viewpoints. The remarkable recent advance in multi-view object-centric learning leaves some problems: 1) the partially or completely occluded shape of objects can not be well reconstructed. 2) the novel viewpoint prediction depends on expensive viewpoint annotations rather than implicit view rules. This makes the agent fail to perform like humans. In this paper, we introduce a time-conditioned generative model for videos. To reconstruct the complete shape of the object accurately, we enhance the disentanglement between different latent representations: view latent representations are jointly inferred based on the Transformer and then cooperate with the sequential extension of Slot Attention to learn object-centric representations. The model also achieves the new ability: Gaussian processes are employed as priors of view latent variables for generation and novel-view prediction without viewpoint annotations. Experiments on multiple specifically designed synthetic datasets have shown that the proposed model can 1) make the video decomposition, 2) reconstruct the complete shapes of objects, and 3) make the novel viewpoint prediction without viewpoint annotations.

READ FULL TEXT

page 29

page 30

page 31

page 32

page 34

page 35

page 36

page 37

research
12/07/2021

Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints

Visual scenes are extremely rich in diversity, not only because there ar...
research
06/07/2021

SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition

To help agents reason about scenes in terms of their building blocks, we...
research
02/23/2023

Object-Centric Video Prediction via Decoupling of Object Dynamics and Interactions

We propose a novel framework for the task of object-centric video predic...
research
03/30/2021

SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks

By estimating 3D shape and instances from a single view, we can capture ...
research
04/14/2023

Symmetry and Complexity in Object-Centric Deep Active Inference Models

Humans perceive and interact with hundreds of objects every day. In doin...
research
07/30/2019

GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations

Generative models are emerging as promising tools in robotics and reinfo...
research
06/07/2021

Novel View Video Prediction Using a Dual Representation

We address the problem of novel view video prediction; given a set of in...

Please sign up or login with your details

Forgot password? Click here to reset