Diffusion-based Generative Speech Source Separation

10/31/2022
by   Robin Scheibler, et al.
0

We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a neural network to approximate the score function of the marginal probabilities or the diffusion-mixing process. Then, we use it to solve the reverse time SDE that progressively separates the sources starting from their mixture. We propose a modified training strategy to handle model mismatch and source permutation ambiguity. Experiments on the WSJ0 2mix dataset demonstrate the potential of the method. Furthermore, the method is also suitable for speech enhancement and shows performance competitive with prior work on the VoiceBank-DEMAND dataset.

READ FULL TEXT

page 2

page 4

research
02/04/2023

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

In this work, we define a diffusion-based generative model capable of bo...
research
02/28/2023

Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

Recently, score-based generative models have been successfully employed ...
research
09/18/2023

Single and Few-step Diffusion for Generative Speech Enhancement

Diffusion models have shown promising results in speech enhancement, usi...
research
07/31/2023

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framew...
research
06/26/2023

Score-based Source Separation with Applications to Digital Communication Signals

We propose a new method for separating superimposed sources using diffus...
research
11/21/2017

Multichannel Source Separation and Speech Enhancement Using the Convolutive Transfer Function

This paper addresses the problem of audio source recovery from multichan...
research
05/25/2023

Martian time-series unraveled: A multi-scale nested approach with factorial variational autoencoders

Unsupervised source separation involves unraveling an unknown set of sou...

Please sign up or login with your details

Forgot password? Click here to reset