Self-Supervised Learning for Videos: A Survey

06/18/2022
by   Madeline C. Schiappa, et al.
0

The remarkable success of deep learning in various domains relies on the availability of large-scale annotated datasets. However, the use of human-generated annotations leads to models with biased learning, poor domain generalization, and poor robustness. Obtaining annotations is also expensive and requires great effort, which is especially challenging for videos. As an alternative, self-supervised learning provides a way for representation learning which does not require annotations and has shown promise in both image and video domains. Different from the image domain, learning video representations are more challenging due to the temporal dimension, bringing in motion and other environmental dynamics. This also provides opportunities for exclusive ideas which can advance self-supervised learning in the video and multimodal domain. In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain. We summarize these methods into three different categories based on their learning objectives: pre-text tasks, generative modeling, and contrastive learning. These approaches also differ in terms of the modality which are being used: video, video-audio, video-text, and video-audio-text. We further introduce the commonly used datasets, downstream evaluation tasks, insights into the limitations of existing works, and the potential future directions in this area.

READ FULL TEXT
research
03/27/2022

How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?

Despite the recent success of video self-supervised learning, there is m...
research
05/16/2021

Self-supervised on Graphs: Contrastive, Generative,or Predictive

Deep learning on graphs has recently achieved remarkable success on a va...
research
11/29/2022

Survey on Self-Supervised Multimodal Representation Learning and Foundation Models

Deep learning has been the subject of growing interest in recent years. ...
research
03/02/2022

Audio Self-supervised Learning: A Survey

Inspired by the humans' cognitive ability to generalise knowledge and sk...
research
08/27/2022

Self-Supervised Face Presentation Attack Detection with Dynamic Grayscale Snippets

Face presentation attack detection (PAD) plays an important role in defe...
research
03/31/2023

Self-Supervised Multimodal Learning: A Survey

Multimodal learning, which aims to understand and analyze information fr...
research
01/30/2022

Self-Supervised Moving Vehicle Detection from Audio-Visual Cues

Robust detection of moving vehicles is a critical task for any autonomou...

Please sign up or login with your details

Forgot password? Click here to reset