Space-Time Correspondence as a Contrastive Random Walk

06/25/2020
by   Allan Jabri, et al.
0

This paper proposes a simple self-supervised approach for learning representations for visual correspondence from raw video. We cast correspondence as link prediction in a space-time graph constructed from a video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a node embedding in which pairwise similarity defines transition probabilities of a random walk. Prediction of long-range correspondence is efficiently computed as a walk along this graph. The embedding learns to guide the walk by placing high probability along paths of correspondence. Targets are formed without supervision, by cycle-consistency: we train the embedding to maximize the likelihood of returning to the initial node when walking along a graph constructed from a `palindrome' of frames. We demonstrate that the approach allows for learning representations from large unlabeled video. Despite its simplicity, the method outperforms the self-supervised state-of-the-art on a variety of label propagation tasks involving objects, semantic parts, and pose. Moreover, we show that self-supervised adaptation at test-time and edge dropout improve transfer for object-level correspondence.

READ FULL TEXT

page 1

page 5

research
09/28/2021

Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning

This paper presents a self-supervised method for learning reliable visua...
research
04/04/2022

Object Permanence Emerges in a Random Walk along Memory

This paper proposes a self-supervised objective for learning representat...
research
01/20/2022

Learning Pixel Trajectories with Multiscale Contrastive Random Walks

A range of video modeling tasks, from optical flow to multiple object tr...
research
03/18/2019

Learning Correspondence from the Cycle-Consistency of Time

We introduce a self-supervised method for learning visual correspondence...
research
12/06/2022

Self-Supervised Correspondence Estimation via Multiview Registration

Video provides us with the spatio-temporal consistency needed for visual...
research
11/28/2022

Mix and Localize: Localizing Sound Sources in Mixtures

We present a method for simultaneously localizing multiple sound sources...
research
11/11/2020

Scribble-Supervised Semantic Segmentation by Random Walk on Neural Representation and Self-Supervision on Neural Eigenspace

Scribble-supervised semantic segmentation has gained much attention rece...

Please sign up or login with your details

Forgot password? Click here to reset