On the Equivalence between Objective Intelligibility and Mean-Squared Error for Deep Neural Network based Speech Enhancement

06/21/2018
by   Morten Kolbæk, et al.
0

Although speech enhancement algorithms based on deep neural networks (DNNs) have shown impressive results, it is unclear, if they are anywhere near optimal in terms of aspects related to human auditory perception, e.g. speech intelligibility. The reason is that the vast majority of DNN based speech enhancement algorithms rely on the mean squared error (MSE) criterion of short-time spectral amplitudes (STSA). State-of-the-art speech intelligibility estimators, on the other hand, rely on linear correlation of speech temporal envelopes. This raises the question if a DNN training criterion based on envelope linear correlation (ELC) can lead to improved intelligibility performance of DNN based speech enhancement algorithms compared to algorithms based on the STSA-MSE criterion. In this paper we derive that, under certain general conditions, the STSA-MSE and ELC criteria are practically equivalent, and we provide empirical data to support our theoretical results. The important implication of our findings is that the standard STSA minimum-MSE estimator is optimal, if the objective is to perform optimally with respect to a state-of-the-art speech intelligibility estimator.

READ FULL TEXT

page 9

page 11

research
05/23/2019

A Perceptual Weighting Filter Loss for DNN Training in Speech Enhancement

Single-channel speech enhancement with deep neural networks (DNNs) has s...
research
05/06/2019

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Utilizing a human-perception-related objective function to train a speec...
research
09/24/2017

A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement

Despite noise suppression being a mature area in signal processing, it r...
research
03/21/2019

Data-driven design of perfect reconstruction filterbank for DNN-based sound source enhancement

We propose a data-driven design method of perfect-reconstruction filterb...
research
05/26/2023

ElectrodeNet – A Deep Learning Based Sound Coding Strategy for Cochlear Implants

ElectrodeNet, a deep learning based sound coding strategy for the cochle...
research
07/24/2023

An objective evaluation of Hearing Aids and DNN-based speech enhancement in complex acoustic scenes

We investigate the objective performance of five high-end commercially a...
research
02/02/2021

Leveraging IoT and Weather Conditions to Estimate the Riders Waiting for the Bus Transit on Campus

The communication technology revolution in this era has increased the us...

Please sign up or login with your details

Forgot password? Click here to reset