MarginNCE: Robust Sound Localization with a Negative Margin

11/03/2022
by   Sooyoung Park, et al.
0

The goal of this work is to localize sound sources in visual scenes with a self-supervised approach. Contrastive learning in the context of sound source localization leverages the natural correspondence between audio and visual signals where the audio-visual pairs from the same source are assumed as positive, while randomly selected pairs are negatives. However, this approach brings in noisy correspondences; for example, positive audio and visual pair signals that may be unrelated to each other, or negative pairs that may contain semantically similar samples to the positive one. Our key contribution in this work is to show that using a less strict decision boundary in contrastive learning can alleviate the effect of noisy correspondences in sound source localization. We propose a simple yet effective approach by slightly modifying the contrastive loss with a negative margin. Extensive experimental results show that our approach gives on-par or better performance than the state-of-the-art methods. Furthermore, we demonstrate that the introduction of a negative margin to existing methods results in a consistent improvement in performance.

READ FULL TEXT
research
02/07/2022

Learning Sound Localization Better From Semantically Similar Samples

The objective of this work is to localize the sound sources in visual sc...
research
03/25/2022

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Sound source localization in visual scenes aims to localize objects emit...
research
03/20/2023

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Self-supervised audio-visual source localization aims to locate sound-so...
research
04/26/2022

Sound Localization by Self-Supervised Time Delay Estimation

Sounds reach one microphone in a stereo pair sooner than the other, resu...
research
11/18/2022

Contrastive Positive Sample Propagation along the Audio-Visual Event Line

Visual and audio signals often coexist in natural environments, forming ...
research
03/17/2022

Localizing Visual Sounds the Easy Way

Unsupervised audio-visual source localization aims at localizing visible...
research
04/29/2022

On Negative Sampling for Audio-Visual Contrastive Learning from Movies

The abundance and ease of utilizing sound, along with the fact that audi...

Please sign up or login with your details

Forgot password? Click here to reset