MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning

12/21/2022
by   Yuan Liu, et al.
0

Learning effective motion features is an essential pursuit of video representation learning. This paper presents a simple yet effective sample construction strategy to boost the learning of motion features in video contrastive learning. The proposed method, dubbed Motion-focused Quadruple Construction (MoQuad), augments the instance discrimination by meticulously disturbing the appearance and motion of both the positive and negative samples to create a quadruple for each video instance, such that the model is encouraged to exploit motion information. Unlike recent approaches that create extra auxiliary tasks for learning motion features or apply explicit temporal modelling, our method keeps the simple and clean contrastive learning paradigm (i.e.,SimCLR) without multi-task learning or extra modelling. In addition, we design two extra training strategies by analyzing initial MoQuad experiments. By simply applying MoQuad to SimCLR, extensive experiments show that we achieve superior performance on downstream tasks compared to the state of the arts. Notably, on the UCF-101 action recognition task, we achieve 93.7 after pre-training the model on Kinetics-400 for only 200 epochs, surpassing various previous methods

READ FULL TEXT

page 6

page 14

research
08/12/2022

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

Contrastive learning has shown great potential in video representation l...
research
07/20/2020

Hierarchical Contrastive Motion Learning for Video Action Recognition

One central question for video action recognition is how to model motion...
research
09/02/2022

Temporal Contrastive Learning with Curriculum

We present ConCur, a contrastive video representation learning method th...
research
09/12/2020

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

One significant factor we expect the video representation learning to ca...
research
08/08/2023

Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning

Self-supervised learning has proved effective for skeleton-based human a...
research
04/27/2022

Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training

Recently, much progress has been made for self-supervised action recogni...
research
08/21/2023

MGMAE: Motion Guided Masking for Video Masked Autoencoding

Masked autoencoding has shown excellent performance on self-supervised v...

Please sign up or login with your details

Forgot password? Click here to reset