Components Loss for Neural Networks in Mask-Based Speech Enhancement

by   Ziyi Xu, et al.

Estimating time-frequency domain masks for single-channel speech enhancement using deep learning methods has recently become a popular research field with promising results. In this paper, we propose a novel components loss (CL) for the training of neural networks for mask-based speech enhancement. During the training process, the proposed CL offers separate control over preservation of the speech component quality, suppression of the residual noise component, and preservation of a naturally sounding residual noise component. We illustrate the potential of the proposed CL by evaluating a standard convolutional neural network (CNN) for mask-based speech enhancement. The new CL obtains a better and more balanced performance in almost all employed instrumental quality metrics over the baseline losses, the latter comprising the conventional mean squared error (MSE) loss and also auditory-related loss functions, such as the perceptual evaluation of speech quality (PESQ) loss and the recently proposed perceptual weighting filter loss. Particularly, applying the CL offers better speech component quality, better overall enhanced speech perceptual quality, as well as a more naturally sounding residual noise. On average, an at least 0.1 points higher PESQ score on the enhanced speech is obtained while also obtaining a higher SNR improvement by more than 0.5 dB, for seen noise types. This improvement is stronger for unseen noise types, where an about 0.2 points higher PESQ score on the enhanced speech is obtained, while also the output SNR is ahead by more than 0.5 dB. The new proposed CL is easy to implement and code is provided at


page 2

page 3

page 4

page 5

page 6

page 7

page 10

page 11


A Perceptual Weighting Filter Loss for DNN Training in Speech Enhancement

Single-channel speech enhancement with deep neural networks (DNNs) has s...

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Utilizing a human-perception-related objective function to train a speec...

AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

Speech is converted to digital signals using speech coding for efficient...

Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet

Speech enhancement employing deep neural networks (DNNs) for denoising a...

Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement

This paper investigates several aspects of training a RNN (recurrent neu...

Kernel Machines Beat Deep Neural Networks on Mask-based Single-channel Speech Enhancement

We apply a fast kernel method for mask-based single-channel speech enhan...

A Supervised Speech enhancement Approach with Residual Noise Control for Voice Communication

For voice communication, it is important to extract the speech from its ...

Please sign up or login with your details

Forgot password? Click here to reset