Learning Sound Localization Better From Semantically Similar Samples

02/07/2022
by   Arda Senocak, et al.
0

The objective of this work is to localize the sound sources in visual scenes. Existing audio-visual works employ contrastive learning by assigning corresponding audio-visual pairs from the same source as positives while randomly mismatched pairs as negatives. However, these negative pairs may contain semantically matched audio-visual information. Thus, these semantically correlated pairs, "hard positives", are mistakenly grouped as negatives. Our key contribution is showing that hard positives can give similar response maps to the corresponding pairs. Our approach incorporates these hard positives by adding their response maps into a contrastive learning objective directly. We demonstrate the effectiveness of our approach on VGG-SS and SoundNet-Flickr test sets, showing favorable performance to the state-of-the-art methods.

READ FULL TEXT

page 1

page 4

research
11/03/2022

MarginNCE: Robust Sound Localization with a Negative Margin

The goal of this work is to localize sound sources in visual scenes with...
research
03/25/2022

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Sound source localization in visual scenes aims to localize objects emit...
research
03/20/2023

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Self-supervised audio-visual source localization aims to locate sound-so...
research
04/26/2022

Robust Audio-Visual Instance Discrimination via Active Contrastive Set Mining

The recent success of audio-visual representation learning can be largel...
research
08/19/2021

Batch Curation for Unsupervised Contrastive Representation Learning

The state-of-the-art unsupervised contrastive visual representation lear...
research
04/06/2021

Localizing Visual Sounds the Hard Way

The objective of this work is to localize sound sources that are visible...
research
07/13/2016

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

Recently, sound recognition has been used to identify sounds, such as ca...

Please sign up or login with your details

Forgot password? Click here to reset