Self-supervised Co-training for Video Representation Learning

by   Tengda Han, et al.

The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation (InfoNCE) training, showing that this form of supervised contrastive learning leads to a clear improvement in performance; (ii) we propose a novel self-supervised co-training scheme to improve the popular infoNCE loss, exploiting the complementary information from different views, RGB streams and optical flow, of the same data source by using one view to obtain positive class samples for the other; (iii) we thoroughly evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval. In both cases, the proposed approach demonstrates state-of-the-art or comparable performance with other self-supervised approaches, whilst being significantly more efficient to train, i.e. requiring far less training data to achieve similar performance.


Memory-augmented Dense Predictive Coding for Video Representation Learning

The objective of this paper is self-supervised learning from video, in p...

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

Instance-level contrastive learning techniques, which rely on data augme...

Self-supervised Video Representation Learning with Cascade Positive Retrieval

Self-supervised video representation learning has been shown to effectiv...

Self-Supervised Visual Representations Learning by Contrastive Mask Prediction

Advanced self-supervised visual representation learning methods rely on ...

MarioNette: Self-Supervised Sprite Learning

Visual content often contains recurring elements. Text is made up of gly...

Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning

Contrastive representation learning has proven to be an effective self-s...

Self-Supervised Ranking for Representation Learning

We present a new framework for self-supervised representation learning b...

Code Repositories


[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.

view repo