Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

06/05/2018
by   Adam R. Kosiorek, et al.
0

We present Sequential Attend, Infer, Repeat (SQAIR), an interpretable deep generative model for videos of moving objects. It can reliably discover and track objects throughout the sequence of frames, and can also generate future frames conditioning on the current frame, thereby simulating expected motion of objects. This is achieved by explicitly encoding object presence, locations and appearances in the latent variables of the model. SQAIR retains all strengths of its predecessor, Attend, Infer, Repeat (AIR, Eslami et. al., 2016), including learning in an unsupervised manner, and addresses its shortcomings. We use a moving multi-MNIST dataset to show limitations of AIR in detecting overlapping or partially occluded objects, and show how SQAIR overcomes them by leveraging temporal consistency of objects. Finally, we also apply SQAIR to real-world pedestrian CCTV data, where it learns to reliably detect, track and generate walking pedestrians with no supervision.

READ FULL TEXT

page 18

page 19

page 20

page 21

page 22

page 23

page 24

page 25

research
03/14/2019

Unsupervised and interpretable scene discovery with Discrete-Attend-Infer-Repeat

In this work we present Discrete Attend Infer Repeat (Discrete-AIR), a R...
research
06/16/2021

Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure

Our goal in this work is to generate realistic videos given just one ini...
research
10/04/2019

Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction

We propose a deep video prediction model conditioned on a single image a...
research
04/01/2020

Object-Centric Image Generation with Factored Depths, Locations, and Appearances

We present a generative model of images that explicitly reasons over the...
research
06/03/2021

GMAIR: Unsupervised Object Detection Based on Spatial Attention and Gaussian Mixture

Recent studies on unsupervised object detection based on spatial attenti...
research
07/24/2020

Unsupervised Discovery of 3D Physical Objects from Video

We study the problem of unsupervised physical object discovery. Unlike e...
research
11/10/2022

Spatiotemporal k-means

Spatiotemporal data is readily available due to emerging sensor and data...

Please sign up or login with your details

Forgot password? Click here to reset