Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

09/12/2020
by   Jinpeng Wang, et al.
0

Self-supervised learning has shown great potentials in improving the video representation ability of deep neural networks by constructing surrogate supervision signals from the unlabeled data. However, some of the current methods tend to suffer from a background cheating problem, i.e., the prediction is highly dependent on the video background instead of the motion, making the model vulnerable to background changes. To alleviate the problem, we propose to remove the background impact by adding the background. That is, given a video, we randomly select a static frame and add it to every other frames to construct a distracting video sample. Then we force the model to pull the feature of the distracting video and the feature of the original video closer, so that the model is explicitly restricted to resist the background influence, focusing more on the motion changes. In addition, in order to prevent the static frame from disturbing the motion area too much, we restrict the feature being consistent with the temporally flipped feature of the reversed video, forcing the model to concentrate more on the motion. We term our method as Temporal-sensitive Background Erasing (TBE). Experiments on UCF101 and HMDB51 show that TBE brings about 6.4 method on the HMDB51 and UCF101 datasets respectively. And it is worth noting that the implementation of our method is so simple and neat and can be added as an additional regularization term to most of the SOTA methods without much efforts.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

research
09/30/2021

Motion-aware Self-supervised Video Representation Learning via Foreground-background Merging

In light of the success of contrastive learning in the image domain, cur...
research
04/10/2022

Self-Supervised Video Representation Learning with Motion-Contrastive Perception

Visual-only self-supervised learning has achieved significant improvemen...
research
08/05/2020

Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition

Self-supervised learning has shown great potentials in improving the dee...
research
10/12/2022

M^3Video: Masked Motion Modeling for Self-Supervised Video Representation Learning

We study self-supervised video representation learning that seeks to lea...
research
07/12/2022

Dual Contrastive Learning for Spatio-temporal Representation

Contrastive learning has shown promising potential in self-supervised sp...
research
04/18/2020

Motion Segmentation using Frequency Domain Transformer Networks

Self-supervised prediction is a powerful mechanism to learn representati...
research
03/09/2021

Self-Supervision by Prediction for Object Discovery in Videos

Despite their irresistible success, deep learning algorithms still heavi...

Please sign up or login with your details

Forgot password? Click here to reset