Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

11/13/2021
by   Li Nanbo, et al.
0

Learning object-centric representations of multi-object scenes is a promising approach towards machine intelligence, facilitating high-level reasoning and control from visual sensory data. However, current approaches for unsupervised object-centric scene representation are incapable of aggregating information from multiple observations of a scene. As a result, these "single-view" methods form their representations of a 3D scene based only on a single 2D observation (view). Naturally, this leads to several inaccuracies, with these methods falling victim to single-view spatial ambiguities. To address this, we propose The Multi-View and Multi-Object Network (MulMON) – a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views. In order to sidestep the main technical difficulty of the multi-object-multi-view scenario – maintaining object correspondences across views – MulMON iteratively updates the latent object representations for a scene over multiple views. To ensure that these iterative updates do indeed aggregate spatial information to form a complete 3D scene understanding, MulMON is asked to predict the appearance of the scene from novel viewpoints during training. Through experiments, we show that MulMON better-resolves spatial ambiguities than single-view methods – learning more accurate and disentangled object representations – and also achieves new functionality in predicting object segmentations for novel viewpoints.

READ FULL TEXT

page 2

page 7

page 8

page 9

page 12

page 16

page 19

page 20

research
11/09/2021

Object-Centric Representation Learning with Generative Spatial-Temporal Factorization

Learning object-centric scene representations is essential for attaining...
research
03/31/2023

Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning

Object-centric learning (OCL) aspires general and compositional understa...
research
01/10/2023

Neural Radiance Field Codebooks

Compositional representations of the world are a promising step towards ...
research
07/20/2020

Object-Centric Multi-View Aggregation

We present an approach for aggregating a sparse set of views of an objec...
research
04/07/2022

AutoRF: Learning 3D Object Radiance Fields from Single View Observations

We introduce AutoRF - a new approach for learning neural 3D object repre...
research
09/16/2017

Scene-centric Joint Parsing of Cross-view Videos

Cross-view video understanding is an important yet under-explored area i...
research
09/16/2023

Efficient Object Rearrangement via Multi-view Fusion

The prospect of assistive robots aiding in object organization has alway...

Please sign up or login with your details

Forgot password? Click here to reset