Self-supervised Co-training for Video Representation Learning

by   Tengda Han, et al.

The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation (InfoNCE) training, showing that this form of supervised contrastive learning leads to a clear improvement in performance; (ii) we propose a novel self-supervised co-training scheme to improve the popular infoNCE loss, exploiting the complementary information from different views, RGB streams and optical flow, of the same data source by using one view to obtain positive class samples for the other; (iii) we thoroughly evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval. In both cases, the proposed approach demonstrates state-of-the-art or comparable performance with other self-supervised approaches, whilst being significantly more efficient to train, i.e. requiring far less training data to achieve similar performance.


Memory-augmented Dense Predictive Coding for Video Representation Learning

The objective of this paper is self-supervised learning from video, in p...

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

Instance-level contrastive learning techniques, which rely on data augme...

Self-supervised Video Representation Learning with Cascade Positive Retrieval

Self-supervised video representation learning has been shown to effectiv...

Self-Supervised Ranking for Representation Learning

We present a new framework for self-supervised representation learning b...

MarioNette: Self-Supervised Sprite Learning

Visual content often contains recurring elements. Text is made up of gly...

A Mutually Reinforced Framework for Pretrained Sentence Embeddings

The lack of labeled data is a major obstacle to learning high-quality se...

Broaden Your Views for Self-Supervised Video Learning

Most successful self-supervised learning methods are trained to align th...

Code Repositories


[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.

view repo

Please sign up or login with your details

Forgot password? Click here to reset