Localizing Visual Sounds the Easy Way

03/17/2022
by   Shentong Mo, et al.
0

Unsupervised audio-visual source localization aims at localizing visible sound sources in a video without relying on ground-truth localization for training. Previous works often seek high audio-visual similarities for likely positive (sounding) regions and low similarities for likely negative regions. However, accurately distinguishing between sounding and non-sounding regions is challenging without manual annotations. In this work, we propose a simple yet effective approach for Easy Visual Sound Localization, namely EZ-VSL, without relying on the construction of positive and/or negative regions during training. Instead, we align audio and visual spaces by seeking audio-visual representations that are aligned in, at least, one location of the associated image, while not matching other images, at any location. We also introduce a novel object guided localization scheme at inference time for improved precision. Our simple and effective framework achieves state-of-the-art performance on two popular benchmarks, Flickr SoundNet and VGG-Sound Source. In particular, we improve the CIoU of the Flickr SoundNet test set from 76.80 83.94 available at https://github.com/stoneMo/EZ-VSL.

READ FULL TEXT

page 13

page 14

research
03/25/2022

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Sound source localization in visual scenes aims to localize objects emit...
research
08/30/2022

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Audio-visual source localization is a challenging task that aims to pred...
research
06/01/2021

Dual Normalization Multitasking for Audio-Visual Sounding Object Localization

Although several research works have been reported on audio-visual sound...
research
03/20/2023

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Self-supervised audio-visual source localization aims to locate sound-so...
research
04/06/2021

Localizing Visual Sounds the Hard Way

The objective of this work is to localize sound sources that are visible...
research
11/03/2022

MarginNCE: Robust Sound Localization with a Negative Margin

The goal of this work is to localize sound sources in visual scenes with...
research
02/03/2023

Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization

In this work, we aim at an important but less explored problem of a simp...

Please sign up or login with your details

Forgot password? Click here to reset