Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

10/12/2020
by   Di Hu, et al.
0

Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes. Code is available at https://github.com/DTaoo/Discriminative-Sounding-Objects-Localization.

READ FULL TEXT

page 2

page 4

page 8

page 12

page 14

research
12/22/2021

Class-aware Sounding Objects Localization via Audiovisual Correspondence

Audiovisual scenes are pervasive in our daily life. It is commonplace fo...
research
03/25/2022

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Sound source localization in visual scenes aims to localize objects emit...
research
08/21/2023

STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning

Scale variation is a deep-rooted problem in object counting, which has n...
research
09/19/2022

NeRF-SOS: Any-View Self-supervised Object Segmentation from Complex Real-World Scenes

Neural volumetric representations have shown the potential that Multi-la...
research
09/08/2023

Unsupervised Object Localization with Representer Point Selection

We propose a novel unsupervised object localization method that allows u...
research
07/13/2021

Kit-Net: Self-Supervised Learning to Kit Novel 3D Objects into Novel 3D Cavities

In industrial part kitting, 3D objects are inserted into cavities for tr...
research
09/28/2022

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

Visual tasks vary a lot in their output formats and concerned contents, ...

Please sign up or login with your details

Forgot password? Click here to reset