DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

07/31/2023
by   Chao Huang, et al.
0

We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner. While existing discriminative methods that perform mask regression have made remarkable progress in this field, they face limitations in capturing the complex data distribution required for high-quality separation of sounds from diverse categories. In contrast, DAVIS leverages a generative diffusion model and a Separation U-Net to synthesize separated magnitudes starting from Gaussian noises, conditioned on both the audio mixture and the visual footage. With its generative objective, DAVIS is better suited to achieving the goal of high-quality sound separation across diverse categories. We compare DAVIS to existing state-of-the-art discriminative audio-visual separation methods on the domain-specific MUSIC dataset and the open-domain AVE dataset, and results show that DAVIS outperforms other methods in separation quality, demonstrating the advantages of our framework for tackling the audio-visual source separation task.

READ FULL TEXT

page 4

page 8

page 9

research
10/27/2020

Remixing Music with Visual Conditioning

We propose a visually conditioned music remixing system by incorporating...
research
10/29/2022

Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation

There exists an unequivocal distinction between the sound produced by a ...
research
03/25/2021

Weakly-supervised Audio-visual Sound Source Detection and Separation

Learning how to localize and separate individual object sounds in the au...
research
10/21/2021

Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence

Over the past few decades, computational methods have been developed to ...
research
05/10/2023

Diffusion-based Signal Refiner for Speech Separation

We have developed a diffusion-based speech refiner that improves the ref...
research
10/31/2022

Diffusion-based Generative Speech Source Separation

We propose DiffSep, a new single channel source separation method based ...
research
04/17/2018

The 2018 Signal Separation Evaluation Campaign

This paper reports the organization and results for the 2018 community-b...

Please sign up or login with your details

Forgot password? Click here to reset