MONet: Unsupervised Scene Decomposition and Representation

01/22/2019
by   Christopher P Burgess, et al.
0

The ability to decompose scenes in terms of abstract building blocks is crucial for general intelligence. Where those basic building blocks share meaningful properties, interactions and other regularities across scenes, such decompositions can simplify reasoning and facilitate imagination of novel scenarios. In particular, representing perceptual observations in terms of entities should improve data efficiency and transfer performance on a wide range of tasks. Thus we need models capable of discovering useful decompositions of scenes by identifying units with such regularities and representing them in a common format. To address this problem, we have developed the Multi-Object Network (MONet). In this model, a VAE is trained end-to-end together with a recurrent attention network -- in a purely unsupervised manner -- to provide attention masks around, and reconstructions of, regions of images. We show that this model is capable of learning to decompose and represent challenging 3D scenes into semantically meaningful components, such as objects and background elements.

READ FULL TEXT

page 5

page 7

page 8

page 10

page 17

page 18

page 19

page 21

research
01/08/2020

SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

The ability to decompose complex multi-object scenes into meaningful abs...
research
06/07/2021

SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition

To help agents reason about scenes in terms of their building blocks, we...
research
10/13/2021

Unsupervised Object Learning via Common Fate

Learning generative object models from unlabelled videos is a long stand...
research
11/19/2019

KISS: Keeping It Simple for Scene Text Recognition

Over the past few years, several new methods for scene text recognition ...
research
10/07/2021

Unsupervised Image Decomposition with Phase-Correlation Networks

The ability to decompose scenes into their object components is a desire...
research
05/16/2022

Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization

Unsupervised localization and segmentation are long-standing computer vi...
research
11/06/2018

Concept Learning with Energy-Based Models

Many hallmarks of human intelligence, such as generalizing from limited ...

Please sign up or login with your details

Forgot password? Click here to reset