Diffusion-based Signal Refiner for Speech Separation

05/10/2023
by   Masato Hirano, et al.
0

We have developed a diffusion-based speech refiner that improves the reference-free perceptual quality of the audio predicted by preceding single-channel speech separation models. Although modern deep neural network-based speech separation models have show high performance in reference-based metrics, they often produce perceptually unnatural artifacts. The recent advancements made to diffusion models motivated us to tackle this problem by restoring the degraded parts of initial separations with a generative approach. Utilizing the denoising diffusion restoration model (DDRM) as a basis, we propose a shared DDRM-based refiner that generates samples conditioned on the global information of preceding outputs from arbitrary speech separation models. We experimentally show that our refiner can provide a clearer harmonic structure of speech and improves the reference-free metric of perceptual quality for arbitrary preceding model architectures. Furthermore, we tune the variance of the measurement noise based on preceding outputs, which results in higher scores in both reference-free and reference-based metrics. The separation quality can also be further improved by blending the discriminative and generative outputs.

READ FULL TEXT
research
10/27/2022

A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Although deep neural network (DNN)-based speech enhancement (SE) methods...
research
04/03/2021

Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

Although neural text-to-speech (TTS) models have attracted a lot of atte...
research
07/31/2023

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framew...
research
08/03/2023

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS

The diffusion model is capable of generating high-quality data through a...
research
01/25/2023

Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation

The problem of speech separation, also known as the cocktail party probl...
research
08/02/2023

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Deep generative models can generate high-fidelity audio conditioned on v...
research
06/01/2023

UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model

This paper introduces UnDiff, a diffusion probabilistic model capable of...

Please sign up or login with your details

Forgot password? Click here to reset