Spatiotemporal Augmentation on Selective Frequencies for Video Representation Learning

04/08/2022
by   Jinhyung Kim, et al.
0

Recent self-supervised video representation learning methods focus on maximizing the similarity between multiple augmented views from the same video and largely rely on the quality of generated views. In this paper, we propose frequency augmentation (FreqAug), a spatio-temporal data augmentation method in the frequency domain for video representation learning. FreqAug stochastically removes undesirable information from the video by filtering out specific frequency components so that learned representation captures essential features of the video for various downstream tasks. Specifically, FreqAug pushes the model to focus more on dynamic features rather than static features in the video via dropping spatial or temporal low-frequency components. In other words, learning invariance between remaining frequency components results in high-frequency enhanced representation with less static bias. To verify the generality of the proposed method, we experiment with FreqAug on multiple self-supervised learning frameworks along with standard augmentations. Transferring the improved representation to five video action recognition and two temporal action localization downstream tasks shows consistent improvements over baselines.

READ FULL TEXT

page 2

page 26

page 31

research
04/01/2021

Composable Augmentation Encoding for Video Representation Learning

We focus on contrastive methods for self-supervised video representation...
research
07/24/2021

Self-Conditioned Probabilistic Learning of Video Rescaling

Bicubic downscaling is a prevalent technique used to reduce the video st...
research
07/27/2020

Representation Learning with Video Deep InfoMax

Self-supervised learning has made unsupervised pretraining relevant agai...
research
10/09/2022

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders

Masked autoencoders (MAEs) have emerged recently as art self-supervised ...
research
09/23/2021

Long Short View Feature Decomposition via Contrastive Video Representation Learning

Self-supervised video representation methods typically focus on the repr...
research
10/28/2019

Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking

Deep neural networks require collecting and annotating large amounts of ...
research
06/09/2022

Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis

Recent self-supervised advances in medical computer vision exploit globa...

Please sign up or login with your details

Forgot password? Click here to reset