Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition

08/05/2020
by   Jinpeng Wang, et al.
0

Self-supervised learning has shown great potentials in improving the deep learning model in an unsupervised manner by constructing surrogate supervision signals directly from the unlabeled data. Different from existing works, we present a novel way to obtain the surrogate supervision signal based on high-level feature maps under consistency regularization. In this paper, we propose a Spatio-Temporal Consistency Regularization between different output features generated from a siamese network including a clean path fed with original video and a noise path fed with the corresponding augmented video. Based on the Spatio-Temporal characteristics of video, we develop two video-based data augmentation methods, i.e., Spatio-Temporal Transformation and Intra-Video Mixup. Consistency of the former one is proposed to model transformation consistency of features, while the latter one aims at retaining spatial invariance to extract action-related features. Extensive experiments demonstrate that our method achieves substantial improvements compared with state-of-the-art self-supervised learning methods for action recognition. When using our method as an additional regularization term and combine with current surrogate supervision signals, we achieve 22 previous state-of-the-art on HMDB51 and 7

READ FULL TEXT

page 2

page 4

page 5

page 8

page 9

page 10

research
09/10/2019

Video Representation Learning by Dense Predictive Coding

The objective of this paper is self-supervised learning of spatio-tempor...
research
08/06/2020

Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Existing video self-supervised learning methods mainly rely on trimmed v...
research
03/05/2020

Self-Supervised Spatio-Temporal Representation Learning Using Variable Playback Speed Prediction

We propose a self-supervised learning method by predicting the variable ...
research
09/12/2020

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Self-supervised learning has shown great potentials in improving the vid...
research
06/20/2020

Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

In self-supervised spatio-temporal representation learning, the temporal...
research
04/10/2022

SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition

Learning an egocentric action recognition model from video data is chall...
research
05/06/2021

PLSM: A Parallelized Liquid State Machine for Unintentional Action Detection

Reservoir Computing (RC) offers a viable option to deploy AI algorithms ...

Please sign up or login with your details

Forgot password? Click here to reset