DeepAI
Log In Sign Up

GATSBI: Generative Agent-centric Spatio-temporal Object Interaction

04/09/2021
by   Cheol-Hui Min, et al.
5

We present GATSBI, a generative model that can transform a sequence of raw observations into a structured latent representation that fully captures the spatio-temporal context of the agent's actions. In vision-based decision-making scenarios, an agent faces complex high-dimensional observations where multiple entities interact with each other. The agent requires a good scene representation of the visual observation that discerns essential components and consistently propagates along the time horizon. Our method, GATSBI, utilizes unsupervised object-centric scene representation learning to separate an active agent, static background, and passive objects. GATSBI then models the interactions reflecting the causal relationships among decomposed entities and predicts physically plausible future states. Our model generalizes to a variety of environments where different types of robots and objects dynamically interact with each other. We show GATSBI achieves superior performance on scene decomposition and video prediction compared to its state-of-the-art counterparts.

READ FULL TEXT

page 22

page 23

page 24

page 25

page 26

page 27

page 30

page 31

11/09/2021

Object-Centric Representation Learning with Generative Spatial-Temporal Factorization

Learning object-centric scene representations is essential for attaining...
06/25/2020

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference

Unsupervised multi-object scene decomposition is a fast-emerging problem...
12/15/2019

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

Action recognition has typically treated actions and activities as monol...
02/02/2015

Complex Background Subtraction by Pursuing Dynamic Spatio-Temporal Models

Although it has been widely discussed in video surveillance, background ...
07/02/2020

RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces

We present RELATE, a model that learns to generate physically plausible ...
12/16/2021

Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation

Video Panoptic Segmentation (VPS) aims at assigning a class label to eac...
10/28/2019

Entity Abstraction in Visual Model-Based Reinforcement Learning

This paper tests the hypothesis that modeling a scene in terms of entiti...

Code Repositories

gatsbi

Official implementation for GATSBI: Generative Agent-centric Spatio-temporal Object Interaction (CVPR'2021)


view repo