Learning Pixel Trajectories with Multiscale Contrastive Random Walks

01/20/2022
by   Zhangxing Bian, et al.
5

A range of video modeling tasks, from optical flow to multiple object tracking, share the same fundamental challenge: establishing space-time correspondence. Yet, approaches that dominate each space differ. We take a step towards bridging this gap by extending the recent contrastive random walk formulation to much denser, pixel-level space-time graphs. The main contribution is introducing hierarchy into the search problem by computing the transition matrix between two frames in a coarse-to-fine manner, forming a multiscale contrastive random walk when extended in time. This establishes a unified technique for self-supervised learning of optical flow, keypoint tracking, and video object segmentation. Experiments demonstrate that, for each of these tasks, the unified model achieves performance competitive with strong self-supervised approaches specific to that task. Project site: https://jasonbian97.github.io/flowwalk

READ FULL TEXT

page 2

page 6

page 7

page 13

research
07/08/2022

Pixel-level Correspondence for Self-Supervised Learning from Video

While self-supervised learning has enabled effective representation lear...
research
06/25/2020

Space-Time Correspondence as a Contrastive Random Walk

This paper proposes a simple self-supervised approach for learning repre...
research
03/18/2019

Learning Correspondence from the Cycle-Consistency of Time

We introduce a self-supervised method for learning visual correspondence...
research
04/04/2022

Object Permanence Emerges in a Random Walk along Memory

This paper proposes a self-supervised objective for learning representat...
research
11/28/2022

Mix and Localize: Localizing Sound Sources in Mixtures

We present a method for simultaneously localizing multiple sound sources...
research
04/26/2022

Sound Localization by Self-Supervised Time Delay Estimation

Sounds reach one microphone in a stereo pair sooner than the other, resu...
research
09/28/2021

Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning

This paper presents a self-supervised method for learning reliable visua...

Please sign up or login with your details

Forgot password? Click here to reset