Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

06/17/2021
by   Efthymios Tzinis, et al.
0

We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify limitations of previous work on audiovisual on-screen sound separation, including the simplicity and coarse resolution of spatio-temporal attention, and poor convergence of the audio separation model. Our proposed model addresses these issues using cross-modal and self-attention modules that capture audio-visual dependencies at a finer resolution over time, and by unsupervised pre-training of audio separation model. These improvements allow the model to generalize to a much wider set of unseen videos. For evaluation and semi-supervised training, we collected human annotations of on-screen audio from a large database of in-the-wild videos (YFCC100M). Our results show marked improvements in on-screen separation performance, in more general conditions than previous methods.

READ FULL TEXT
research
07/20/2022

AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation

We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-...
research
11/02/2020

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

Recent progress in deep learning has enabled many advances in sound sepa...
research
09/18/2021

V-SlowFast Network for Efficient Visual Sound Separation

The objective of this paper is to perform visual sound separation: i) we...
research
03/24/2021

Repetitive Activity Counting by Sight and Sound

This paper strives for repetitive activity counting in videos. Different...
research
12/14/2022

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Recent years have seen progress beyond domain-specific sound separation ...
research
02/02/2022

Active Audio-Visual Separation of Dynamic Sound Sources

We explore active audio-visual separation for dynamic sound sources, whe...
research
01/26/2020

Curriculum Audiovisual Learning

Associating sound and its producer in complex audiovisual scene is a cha...

Please sign up or login with your details

Forgot password? Click here to reset