Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints

by   Jinyang Yuan, et al.

Visual scenes are extremely rich in diversity, not only because there are infinite combinations of objects and background, but also because the observations of the same scene may vary greatly with the change of viewpoints. When observing a visual scene that contains multiple objects from multiple viewpoints, humans are able to perceive the scene in a compositional way from each viewpoint, while achieving the so-called "object constancy" across different viewpoints, even though the exact viewpoints are untold. This ability is essential for humans to identify the same object while moving and to learn from vision efficiently. It is intriguing to design models that have the similar ability. In this paper, we consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision, and propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem. To infer latent representations, the information contained in different viewpoints is iteratively integrated by neural networks. Experiments on several specifically designed synthetic datasets have shown that the proposed method is able to effectively learn from multiple unspecified viewpoints.


page 6

page 21

page 22

page 23

page 24

page 25

page 26

page 27


Compositional Scene Representation Learning via Reconstruction: A Survey

Visual scene representation learning is an important research problem in...

Generative Hierarchical Models for Parts, Objects, and Scenes

Compositional structures between parts and objects are inherent in natur...

Learning to Infer 3D Object Models from Images

A crucial ability of human intelligence is to build up models of individ...

Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction

When perceiving the world from multiple viewpoints, humans have the abil...

Compositional Scene Modeling with Global Object-Centric Representations

The appearance of the same object may vary in different scene images due...

Knowledge-Guided Object Discovery with Acquired Deep Impressions

We present a framework called Acquired Deep Impressions (ADI) which cont...

Subitizing with Variational Autoencoders

Numerosity, the number of objects in a set, is a basic property of a giv...

Please sign up or login with your details

Forgot password? Click here to reset