Video 3D Sampling for Self-supervised Representation Learning

07/08/2021
by   Wei Li, et al.
0

Most of the existing video self-supervised methods mainly leverage temporal signals of videos, ignoring that the semantics of moving objects and environmental information are all critical for video-related tasks. In this paper, we propose a novel self-supervised method for video representation learning, referred to as Video 3D Sampling (V3S). In order to sufficiently utilize the information (spatial and temporal) provided in videos, we pre-process a video from three dimensions (width, height, time). As a result, we can leverage the spatial information (the size of objects), temporal information (the direction and magnitude of motions) as our learning target. In our implementation, we combine the sampling of the three dimensions and propose the scale and projection transformations in space and time respectively. The experimental results show that, when applied to action recognition, video retrieval and action similarity labeling, our approach improves the state-of-the-arts with significant margins.

READ FULL TEXT

page 1

page 5

page 8

research
08/06/2020

Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Existing video self-supervised learning methods mainly rely on trimmed v...
research
12/11/2021

Self-supervised Spatiotemporal Representation Learning by Exploiting Video Continuity

Recent self-supervised video representation learning methods have found ...
research
04/13/2020

SpeedNet: Learning the Speediness in Videos

We wish to automatically predict the "speediness" of moving objects in v...
research
08/05/2020

Self-supervised Temporal Discriminative Learning for Video Representation Learning

Temporal cues in videos provide important information for recognizing ac...
research
04/10/2022

SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition

Learning an egocentric action recognition model from video data is chall...
research
05/04/2022

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

Recognizing transformation types applied to a video clip (RecogTrans) is...
research
05/10/2023

Self-Supervised Video Representation Learning via Latent Time Navigation

Self-supervised video representation learning aimed at maximizing simila...

Please sign up or login with your details

Forgot password? Click here to reset