How to Listen? Rethinking Visual Sound Localization

04/11/2022
by   Ho-Hsiang Wu, et al.
0

Localizing visual sounds consists on locating the position of objects that emit sound within an image. It is a growing research area with potential applications in monitoring natural and urban environments, such as wildlife migration and urban traffic. Previous works are usually evaluated with datasets having mostly a single dominant visible object, and proposed models usually require the introduction of localization modules during training or dedicated sampling strategies, but it remains unclear how these design choices play a role in the adaptability of these methods in more challenging scenarios. In this work, we analyze various model choices for visual sound localization and discuss how their different components affect the model's performance, namely the encoders' architecture, the loss function and the localization strategy. Furthermore, we study the interaction between these decisions, the model performance, and the data, by digging into different evaluation datasets spanning different difficulties and characteristics, and discuss the implications of such decisions in the context of real-world applications. Our code and model weights are open-sourced and made available for further applications.

READ FULL TEXT

page 3

page 4

research
11/15/2022

FlowGrad: Using Motion for Visual Sound Source Localization

Most recent work in visual sound source localization relies on semantic ...
research
02/13/2022

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

The task of audio-visual sound source localization has been well studied...
research
08/30/2022

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Audio-visual source localization is a challenging task that aims to pred...
research
06/01/2021

Dual Normalization Multitasking for Audio-Visual Sounding Object Localization

Although several research works have been reported on audio-visual sound...
research
04/13/2023

You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset

We introduce Flatlandia, a novel problem for visual localization of an i...
research
03/02/2020

MOZARD: Multi-Modal Localization for Autonomous Vehicles in Urban Outdoor Environments

Visually poor scenarios are one of the main sources of failure in visual...

Please sign up or login with your details

Forgot password? Click here to reset