Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

06/20/2020
by   Yuan Yao, et al.
11

In self-supervised spatio-temporal representation learning, the temporal resolution and long-short term characteristics are not yet fully explored, which limits representation capabilities of learned models. In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. PRP roots in a dilated sampling strategy, which produces self-supervision signals about video playback rates for representation model learning. PRP is implemented with a feature encoder, a classification module, and a reconstructing decoder, to achieve spatio-temporal semantic retention in a collaborative discrimination-generation manner. The discriminative perception model follows a feature encoder to prefer perceiving low temporal resolution and long-term representation by classifying fast-forward rates. The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism. PRP is applied on typical video target tasks including action recognition and video retrieval. Experiments show that PRP outperforms state-of-the-art self-supervised models with significant margins. Code is available at github.com/yuanyao366/PRP

READ FULL TEXT

page 1

page 4

page 5

page 7

page 8

research
01/02/2020

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

We propose a novel self-supervised method, referred to as Video Cloze Pr...
research
08/25/2022

Spatio-Temporal Representation Learning Enhanced Source Cell-phone Recognition from Speech Recordings

The existing source cell-phone recognition method lacks the long-term fe...
research
11/30/2022

Spatio-Temporal Crop Aggregation for Video Representation Learning

We propose Spatio-temporal Crop Aggregation for video representation LEa...
research
12/16/2021

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Spatio-temporal representation learning is critical for video self-super...
research
05/06/2021

PLSM: A Parallelized Liquid State Machine for Unintentional Action Detection

Reservoir Computing (RC) offers a viable option to deploy AI algorithms ...
research
08/05/2020

Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition

Self-supervised learning has shown great potentials in improving the dee...
research
11/01/2021

LSTA-Net: Long short-term Spatio-Temporal Aggregation Network for Skeleton-based Action Recognition

Modelling various spatio-temporal dependencies is the key to recognising...

Please sign up or login with your details

Forgot password? Click here to reset