How Incomplete is Contrastive Learning? An Inter-intra Variant Dual Representation Method for Self-supervised Video Recognition

07/02/2021
by   Lin Zhang, et al.
0

Contrastive learning applied to self-supervised representation learning has seen a resurgence in deep models. In this paper, we find that existing contrastive learning based solutions for self-supervised video recognition focus on inter-variance encoding but ignore the intra-variance existing in clips within the same video. We thus propose to learn dual representations for each clip which (1) encode intra-variance through a shuffle-rank pretext task; (2) encode inter-variance through a temporal coherent contrastive loss. Experiment results show that our method plays an essential role in balancing inter and intra variances and brings consistent performance gains on multiple backbones and contrastive learning frameworks. Integrated with SimCLR and pretrained on Kinetics-400, our method achieves 82.0% and 51.2% downstream classification accuracy on UCF101 and HMDB51 test sets respectively and 46.1% video retrieval accuracy on UCF101, outperforming both pretext-task based and contrastive learning based counterparts.

READ FULL TEXT

page 16

page 17

research
08/06/2020

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

We propose a self-supervised method to learn feature representations fro...
research
10/29/2020

Self-Supervised Video Representation Using Pretext-Contrastive Learning

Pretext tasks and contrastive learning have been successful in self-supe...
research
09/14/2022

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization

Noise robustness in keyword spotting remains a challenge as many models ...
research
03/13/2023

Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos

Contrastive learning has recently narrowed the gap between self-supervis...
research
11/17/2020

Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning

Whole slide images (WSIs) have large resolutions and usually lack locali...
research
03/27/2022

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Our target is to learn visual correspondence from unlabeled videos. We d...
research
08/06/2021

Spatiotemporal Contrastive Learning of Facial Expressions in Videos

We propose a self-supervised contrastive learning approach for facial ex...

Please sign up or login with your details

Forgot password? Click here to reset