End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

01/26/2019
by   Jaeyoung Kim, et al.
0

Supervised learning based on a deep neural network recently has achieved substantial improvement on speech enhancement. Denoising networks learn mapping from noisy speech to clean one directly, or to a spectra mask which is the ratio between clean and noisy spectrum. In either case, the network is optimized by minimizing mean square error (MSE) between predefined labels and network output of spectra or time-domain signal. However, existing schemes have either of two critical issues: spectra and metric mismatches. The spectra mismatch is a well known issue that any spectra modification after short-time Fourier transform (STFT), in general, cannot be fully recovered after inverse STFT. The metric mismatch is that a conventional MSE metric is sub-optimal to maximize our target metrics, signal-to-distortion ratio (SDR) and perceptual evaluation of speech quality (PESQ). This paper presents a new end-to-end denoising framework with the goal of joint SDR and PESQ optimization. First, the network optimization is performed on the time-domain signals after ISTFT to avoid spectra mismatch. Second, two loss functions which have improved correlations with SDR and PESQ metrics are proposed to minimize metric mismatch. The experimental result showed that the proposed denoising scheme significantly improved both SDR and PESQ performance over the existing methods.

READ FULL TEXT
research
10/23/2019

End-to-End Multi-Task Denoising for the Joint Optimization of Perceptual Speech Metrics

Although supervised learning based on a deep neural network has recently...
research
09/03/2019

On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement

Many deep learning-based speech enhancement algorithms are designed to m...
research
08/17/2023

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Phase information has a significant impact on speech perceptual quality ...
research
01/29/2018

On Psychoacoustically Weighted Cost Functions Towards Resource-Efficient Deep Neural Networks for Speech Denoising

We present a psychoacoustically enhanced cost function to balance networ...
research
09/12/2017

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Speech enhancement model is used to map a noisy speech to a clean speech...
research
05/13/2023

APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra

This paper presents a novel neural vocoder named APNet which reconstruct...
research
06/01/2020

Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net

In this work, we tackle a denoising and dereverberation problem with a s...

Please sign up or login with your details

Forgot password? Click here to reset