Self-supervised Learning for Video Correspondence Flow

05/02/2019
by   Zihang Lai, et al.
20

The objective of this paper is self-supervised learning of feature embeddings from videos, suitable for correspondence flow, i.e. matching correspondences between frames over the video. We leverage the natural spatial-temporal coherence of appearance in videos, to create a "pointer" model that learns to reconstruct a target frame by copying colors from a reference frame. We make three contributions: First, we introduce a simple information bottleneck that enforces the model to learn robust features for correspondence matching, and avoids it learning trivial solutions, e.g. matching based on low-level color information. Second, we propose to train the model over a long temporal window in videos. To make the model more robust to complex object deformation, occlusion, i.e. the problem of tracker drifting, we formulate a recursive model, trained with scheduled sampling and cycle consistency. Third, we evaluate the approach by first training on the Kinetics dataset using self-supervised learning, and then directly applied for DAVIS video segmentation and JHMDB keypoint tracking. On both tasks, our approach has achieved state-of-the-art performance, especially on segmentation, we outperform all previous methods by a significant margin.

READ FULL TEXT

page 2

page 4

page 5

page 14

page 15

research
06/22/2020

Self-supervised Video Object Segmentation

The objective of this paper is self-supervised representation learning, ...
research
06/25/2018

Tracking Emerges by Colorizing Videos

We use large amounts of unlabeled video to learn models for visual track...
research
10/28/2019

Self-supervised learning of class embeddings from video

This work explores how to use self-supervised learning on videos to lear...
research
05/20/2019

Learning Video Representations from Correspondence Proposals

Correspondences between frames encode rich information about dynamic con...
research
03/18/2019

Learning Correspondence from the Cycle-Consistency of Time

We introduce a self-supervised method for learning visual correspondence...
research
09/16/2022

Spatial-then-Temporal Self-Supervised Learning for Video Correspondence

Learning temporal correspondence from unlabeled videos is of vital impor...
research
12/06/2022

Self-Supervised Correspondence Estimation via Multiview Registration

Video provides us with the spatio-temporal consistency needed for visual...

Please sign up or login with your details

Forgot password? Click here to reset