Co-Separating Sounds of Visual Objects

04/16/2019
by   Ruohan Gao, et al.
0

Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel. Current methods for visually-guided audio source separation sidestep the issue by training with artificially mixed video clips, but this puts unwieldy restrictions on training data collection and may even prevent learning the properties of "true" mixed sounds. We introduce a co-separation training paradigm that permits learning object-level sounds from unlabeled multi-source videos. Our novel training objective requires that the deep neural network's separated audio for similar-looking objects be consistently identifiable, while simultaneously reproducing accurate video-level audio tracks for each source training pair. Our approach disentangles sounds in realistic test videos, even in cases where an object was not observed individually during training. We obtain state-of-the-art results on visually-guided audio source separation and audio denoising for the MUSIC, AudioSet, and AV-Bench datasets. Our video results: http://vision.cs.utexas.edu/projects/coseparation/

READ FULL TEXT

page 1

page 4

page 8

page 9

research
04/05/2018

Learning to Separate Object Sounds by Watching Unlabeled Video

Perceiving a scene most fully requires all the senses. Yet modeling how ...
research
03/25/2021

Weakly-supervised Audio-visual Sound Source Detection and Separation

Learning how to localize and separate individual object sounds in the au...
research
05/15/2021

Move2Hear: Active Audio-Visual Source Separation

We introduce the active audio-visual source separation problem, where an...
research
02/11/2021

Multichannel-based learning for audio object extraction

The current paradigm for creating and deploying immersive audio content ...
research
07/20/2020

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

Stereophonic audio is an indispensable ingredient to enhance human audit...
research
12/11/2018

2.5D Visual Sound

Binaural audio provides a listener with 3D sound sensation, allowing a r...
research
06/04/2020

Visually Guided Sound Source Separation using Cascaded Opponent Filter Network

The objective of this paper is to recover the original component signals...

Please sign up or login with your details

Forgot password? Click here to reset