Unsupervised Learning of Object Structure and Dynamics from Videos

06/19/2019
by   Matthias Minderer, et al.
4

Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning. To address this challenge, we adopt a keypoint-based image representation and learn a stochastic dynamics model of the keypoints. Future frames are reconstructed from the keypoints and a reference frame. By modeling dynamics in the keypoint coordinate space, we achieve stable learning and avoid compounding of errors in pixel space. Our method improves upon unstructured representations both for pixel-level video prediction and for downstream tasks requiring object-level understanding of motion dynamics. We evaluate our model on diverse datasets: a multi-agent sports dataset, the Human3.6M dataset, and datasets based on continuous control tasks from the DeepMind Control Suite. The spatially structured representation outperforms unstructured representations on a range of motion-related tasks such as object tracking, action recognition and reward prediction.

READ FULL TEXT

page 5

page 6

page 7

page 18

page 19

research
09/30/2022

An information-theoretic approach to unsupervised keypoint representation learning

Extracting informative representations from videos is fundamental for th...
research
09/11/2023

Learning Geometric Representations of Objects via Interaction

We address the problem of learning representations from observations of ...
research
06/19/2019

Unsupervised Learning of Object Keypoints for Perception and Control

The study of object representations in computer vision has primarily foc...
research
06/06/2023

Learn the Force We Can: Multi-Object Video Generation from Pixel-Level Interactions

We propose a novel unsupervised method to autoregressively generate vide...
research
05/16/2023

Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Action recognition is an important problem that requires identifying act...
research
03/12/2019

Unsupervised Discovery of Parts, Structure, and Dynamics

Humans easily recognize object parts and their hierarchical structure by...
research
03/12/2016

Temporally Robust Global Motion Compensation by Keypoint-based Congealing

Global motion compensation (GMC) removes the impact of camera motion and...

Please sign up or login with your details

Forgot password? Click here to reset