On the Equivalence between Objective Intelligibility and Mean-Squared Error for Deep Neural Network based Speech Enhancement

by   Morten Kolbæk, et al.

Although speech enhancement algorithms based on deep neural networks (DNNs) have shown impressive results, it is unclear, if they are anywhere near optimal in terms of aspects related to human auditory perception, e.g. speech intelligibility. The reason is that the vast majority of DNN based speech enhancement algorithms rely on the mean squared error (MSE) criterion of short-time spectral amplitudes (STSA). State-of-the-art speech intelligibility estimators, on the other hand, rely on linear correlation of speech temporal envelopes. This raises the question if a DNN training criterion based on envelope linear correlation (ELC) can lead to improved intelligibility performance of DNN based speech enhancement algorithms compared to algorithms based on the STSA-MSE criterion. In this paper we derive that, under certain general conditions, the STSA-MSE and ELC criteria are practically equivalent, and we provide empirical data to support our theoretical results. The important implication of our findings is that the standard STSA minimum-MSE estimator is optimal, if the objective is to perform optimally with respect to a state-of-the-art speech intelligibility estimator.



There are no comments yet.


page 9

page 11


A Perceptual Weighting Filter Loss for DNN Training in Speech Enhancement

Single-channel speech enhancement with deep neural networks (DNNs) has s...

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Utilizing a human-perception-related objective function to train a speec...

A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement

Despite noise suppression being a mature area in signal processing, it r...

Data-driven design of perfect reconstruction filterbank for DNN-based sound source enhancement

We propose a data-driven design method of perfect-reconstruction filterb...

DNN Based Speech Enhancement for Unseen Noises Using Monte Carlo Dropout

In this work, we propose the use of dropouts as a Bayesian estimator for...

Leveraging IoT and Weather Conditions to Estimate the Riders Waiting for the Bus Transit on Campus

The communication technology revolution in this era has increased the us...

DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score

We propose a training method for deep neural network (DNN)-based source ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.