Employing Real Training Data for Deep Noise Suppression

09/05/2023
by   Ziyi Xu, et al.
0

Most deep noise suppression (DNS) models are trained with reference-based losses requiring access to clean speech. However, sometimes an additive microphone model is insufficient for real-world applications. Accordingly, ways to use real training data in supervised learning for DNS models promise to reduce a potential training/inference mismatch. Employing real data for DNS training requires either generative approaches or a reference-free loss without access to the corresponding clean speech. In this work, we propose to employ an end-to-end non-intrusive deep neural network (DNN), named PESQ-DNN, to estimate perceptual evaluation of speech quality (PESQ) scores of enhanced real data. It provides a reference-free perceptual loss for employing real data during DNS training, maximizing the PESQ scores. Furthermore, we use an epoch-wise alternating training protocol, updating the DNS model on real data, followed by PESQ-DNN updating on synthetic data. The DNS model trained with the PESQ-DNN employing real data outperforms all reference methods employing only synthetic training data. On synthetic test data, our proposed method excels the Interspeech 2021 DNS Challenge baseline by a significant 0.32 PESQ points. Both on synthetic and real test data, the proposed method beats the baseline by 0.05 DNSMOS points - although PESQ-DNN optimizes for a different perceptual metric.

READ FULL TEXT
research
11/06/2021

Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet

Speech enhancement employing deep neural networks (DNNs) for denoising a...
research
05/04/2022

Does a PESQNet (Loss) Require a Clean Reference Input? The Original PESQ Does, But ACR Listening Tests Don't

Perceptual evaluation of speech quality (PESQ) requires a clean speech r...
research
01/26/2021

Overestimation learning with guarantees

We describe a complete method that learns a neural network which is guar...
research
04/25/2020

StRDAN: Synthetic-to-Real Domain Adaptation Network for Vehicle Re-Identification

Vehicle re-identification aims to obtain the same vehicles from vehicle ...
research
06/21/2023

HumanDiffusion: diffusion model using perceptual gradients

We propose HumanDiffusion, a diffusion model trained from humans' percep...
research
12/10/2020

Data-Efficient Framework for Real-world Multiple Sound Source 2D Localization

Deep neural networks have recently led to promising results for the task...
research
03/27/2020

Voice activity detection in the wild via weakly supervised sound event detection

Traditional supervised voice activity detection (VAD) methods work well ...

Please sign up or login with your details

Forgot password? Click here to reset