Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation

04/05/2021
by   Yapeng Tian, et al.
0

There are rich synchronized audio and visual events in our daily life. Inside the events, audio scenes are associated with the corresponding visual objects; meanwhile, sounding objects can indicate and help to separate their individual sounds in the audio track. Based on this observation, in this paper, we propose a cyclic co-learning (CCoL) paradigm that can jointly learn sounding object visual grounding and audio-visual sound separation in a unified framework. Concretely, we can leverage grounded object-sound relations to improve the results of sound separation. Meanwhile, benefiting from discriminative information from separated sounds, we improve training example sampling for sounding object grounding, which builds a co-learning cycle for the two tasks and makes them mutually beneficial. Extensive experiments show that the proposed framework outperforms the compared recent approaches on both tasks, and they can benefit from each other with our cyclic co-learning.

READ FULL TEXT

page 1

page 3

page 6

page 7

page 8

research
04/05/2018

Learning to Separate Object Sounds by Watching Unlabeled Video

Perceiving a scene most fully requires all the senses. Yet modeling how ...
research
03/25/2021

Weakly-supervised Audio-visual Sound Source Detection and Separation

Learning how to localize and separate individual object sounds in the au...
research
09/22/2021

Audio-Visual Grounding Referring Expression for Robotic Manipulation

Referring expressions are commonly used when referring to a specific tar...
research
03/14/2020

Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events

Immersive audio-visual perception relies on the spatial integration of b...
research
02/11/2021

Multichannel-based learning for audio object extraction

The current paradigm for creating and deploying immersive audio content ...
research
07/20/2020

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

Stereophonic audio is an indispensable ingredient to enhance human audit...
research
04/09/2018

The Sound of Pixels

We introduce PixelPlayer, a system that, by leveraging large amounts of ...

Please sign up or login with your details

Forgot password? Click here to reset