Motion-Focused Contrastive Learning of Video Representations

01/11/2022
by   Rui Li, et al.
4

Motion, as the most distinct phenomenon in a video to involve the changes over time, has been unique and critical to the development of video representation learning. In this paper, we ask the question: how important is the motion particularly for self-supervised video representation learning. To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning. Specifically, we present a Motion-focused Contrastive Learning (MCL) method that regards such duet as the foundation. On one hand, MCL capitalizes on optical flow of each frame in a video to temporally and spatially sample the tubelets (i.e., sequences of associated frame patches across time) as data augmentations. On the other hand, MCL further aligns gradient maps of the convolutional layers to optical flow maps from spatial, temporal and spatio-temporal perspectives, in order to ground motion information in feature learning. Extensive experiments conducted on R(2+1)D backbone demonstrate the effectiveness of our MCL. On UCF101, the linear classifier trained on the representations learnt by MCL achieves 81.91 by 6.78 protocol. Code is available at https://github.com/YihengZhang-CV/MCL-Motion-Focused-Contrastive-Learning.

READ FULL TEXT

page 3

page 4

page 8

research
08/12/2022

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

Contrastive learning has shown great potential in video representation l...
research
12/16/2021

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Spatio-temporal representation learning is critical for video self-super...
research
04/07/2019

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

We address the problem of video representation learning without human-an...
research
03/10/2021

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

MoCo is effective for unsupervised image representation learning. In thi...
research
08/07/2022

Learning Omnidirectional Flow in 360-degree Video via Siamese Representation

Optical flow estimation in omnidirectional videos faces two significant ...
research
02/27/2019

Single-frame Regularization for Temporally Stable CNNs

Convolutional neural networks (CNNs) can model complicated non-linear re...
research
06/18/2021

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

Instance-level contrastive learning techniques, which rely on data augme...

Please sign up or login with your details

Forgot password? Click here to reset