Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

06/18/2021
by   Martine Toering, et al.
0

Instance-level contrastive learning techniques, which rely on data augmentation and a contrastive loss function, have found great success in the domain of visual representation learning. They are not suitable for exploiting the rich dynamical structure of video however, as operations are done on many augmented instances. In this paper we propose "Video Cross-Stream Prototypical Contrasting", a novel method which predicts consistent prototype assignments from both RGB and optical flow views, operating on sets of samples. Specifically, we alternate the optimization process; while optimizing one of the streams, all views are mapped to one set of stream prototype vectors. Each of the assignments is predicted with all views except the one matching the prediction, pushing representations closer to their assigned prototypes. As a result, more efficient video embeddings with ingrained motion information are learned, without the explicit need for optical flow computation during inference. We obtain state-of-the-art results on nearest neighbour video retrieval and action recognition, outperforming previous best by +3.2 UCF101 using the S3D backbone (90.5 +15.1

READ FULL TEXT

page 8

page 13

research
10/19/2020

Self-supervised Co-training for Video Representation Learning

The objective of this paper is visual-only self-supervised video represe...
research
08/12/2022

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

Contrastive learning has shown great potential in video representation l...
research
09/13/2021

Online Unsupervised Learning of Visual Representations and Categories

Real world learning scenarios involve a nonstationary distribution of cl...
research
02/03/2023

Contrastive Learning with Consistent Representations

Contrastive learning demonstrates great promise for representation learn...
research
01/11/2022

Motion-Focused Contrastive Learning of Video Representations

Motion, as the most distinct phenomenon in a video to involve the change...
research
08/24/2021

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning

The central idea of contrastive learning is to discriminate between diff...
research
03/28/2023

Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval

Colonoscopic video retrieval, which is a critical part of polyp treatmen...

Please sign up or login with your details

Forgot password? Click here to reset