The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning

10/13/2021
by   Haider Al-Tahan, et al.
0

Contrastive learning of auditory and visual perception has been extremely successful when investigated individually. However, there are still major questions on how we could integrate principles learned from both domains to attain effective audiovisual representations. In this paper, we present a contrastive framework to learn audiovisual representations from unlabeled videos. The type and strength of augmentations utilized during self-supervised pre-training play a crucial role for contrastive frameworks to work sufficiently. Hence, we extensively investigate composition of temporal augmentations suitable for learning audiovisual representations; we find lossy spatio-temporal transformations that do not corrupt the temporal coherency of videos are the most effective. Furthermore, we show that the effectiveness of these transformations scales with higher temporal resolution and stronger transformation intensity. Compared to self-supervised models pre-trained on only sampling-based temporal augmentation, self-supervised models pre-trained with our temporal augmentations lead to approximately 6.5 classifier performance on AVE dataset. Lastly, we show that despite their simplicity, our proposed transformations work well across self-supervised learning frameworks (SimSiam, MoCoV3, etc), and benchmark audiovisual dataset (AVE).

READ FULL TEXT
research
10/19/2020

CLAR: Contrastive Learning of Auditory Representations

Learning rich visual representations using contrastive self-supervised l...
research
08/06/2021

Spatiotemporal Contrastive Learning of Facial Expressions in Videos

We propose a self-supervised contrastive learning approach for facial ex...
research
09/06/2023

Spatio-Temporal Contrastive Self-Supervised Learning for POI-level Crowd Flow Inference

Accurate acquisition of crowd flow at Points of Interest (POIs) is pivot...
research
08/07/2023

Deepfake Detection: A Comparative Analysis

This paper present a comprehensive comparative analysis of supervised an...
research
02/08/2022

Self-supervised Contrastive Learning for Volcanic Unrest Detection

Ground deformation measured from Interferometric Synthetic Aperture Rada...
research
02/18/2020

Data Transformation Insights in Self-supervision with Clustering Tasks

Self-supervision is key to extending use of deep learning for label scar...
research
06/30/2020

Self-Supervised Learning of a Biologically-Inspired Visual Texture Model

We develop a model for representing visual texture in a low-dimensional ...

Please sign up or login with your details

Forgot password? Click here to reset