An information-theoretic approach to unsupervised keypoint representation learning

09/30/2022
by   Ali Younes, et al.
14

Extracting informative representations from videos is fundamental for the effective learning of various downstream tasks. Inspired by classical works on saliency, we present a novel information-theoretic approach to discover meaningful representations from videos in an unsupervised fashion. We argue that local entropy of pixel neighborhoods and its evolution in a video stream is a valuable intrinsic supervisory signal for learning to attend to salient features. We, thus, abstract visual features into a concise representation of keypoints that serve as dynamic information transporters. We discover in an unsupervised fashion spatio-temporally consistent keypoint representations that carry the prominent information across video frames, thanks to two original information-theoretic losses. First, a loss that maximizes the information covered by the keypoints in a frame. Second, a loss that encourages optimized keypoint transportation over time, thus, imposing consistency of the information flow. We evaluate our keypoint-based representation compared to state-of-the-art baselines in different downstream tasks such as learning object dynamics. To evaluate the expressivity and consistency of the keypoints, we propose a new set of metrics. Our empirical results showcase the superior performance of our information-driven keypoints that resolve challenges like attendance to both static and dynamic objects, and to objects abruptly entering and leaving the scene.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 7

page 8

page 15

page 16

research
06/19/2019

Unsupervised Learning of Object Structure and Dynamics from Videos

Extracting and predicting object structure and dynamics from videos with...
research
06/28/2020

Video Representation Learning with Visual Tempo Consistency

Visual tempo, which describes how fast an action goes, has shown its pot...
research
05/31/2022

Unsupervised Image Representation Learning with Deep Latent Particles

We propose a new representation of visual data that disentangles object ...
research
11/25/2020

Unsupervised Object Keypoint Learning using Local Spatial Predictability

We propose PermaKey, a novel approach to representation learning based o...
research
06/19/2019

Unsupervised Learning of Object Keypoints for Perception and Control

The study of object representations in computer vision has primarily foc...
research
09/11/2023

Learning Geometric Representations of Objects via Interaction

We address the problem of learning representations from observations of ...
research
06/15/2021

End-to-End Learning of Keypoint Representations for Continuous Control from Images

In many control problems that include vision, optimal controls can be in...

Please sign up or login with your details

Forgot password? Click here to reset