Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

12/09/2022
by   Huajian Fang, et al.
0

Single-channel deep speech enhancement approaches often estimate a single multiplicative mask to extract clean speech without a measure of its accuracy. Instead, in this work, we propose to quantify the uncertainty associated with clean speech estimates in neural network-based speech enhancement. Predictive uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty. The former accounts for the inherent uncertainty in data and the latter corresponds to the model uncertainty. Aiming for robust clean speech estimation and efficient predictive uncertainty quantification, we propose to integrate statistical complex Gaussian mixture models (CGMMs) into a deep speech enhancement framework. More specifically, we model the dependency between input and output stochastically by means of a conditional probability density and train a neural network to map the noisy input to the full posterior distribution of clean speech, modeled as a mixture of multiple complex Gaussian components. Experimental results on different datasets show that the proposed algorithm effectively captures predictive uncertainty and that combining powerful statistical models and deep learning also delivers a superior speech enhancement performance.

READ FULL TEXT
research
05/15/2023

Integrating Uncertainty into Neural Network-based Speech Enhancement

Supervised masking approaches in the time-frequency domain aim to employ...
research
03/04/2022

Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement

Speech enhancement in the time-frequency domain is often performed by es...
research
10/31/2017

Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors

A F0 and voicing status estimation algorithm for speech analysis/synthes...
research
05/04/2023

Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization

Predicting human scanpaths when exploring panoramic videos is a challeng...
research
11/02/2021

Elucidating Noisy Data via Uncertainty-Aware Robust Learning

Robust learning methods aim to learn a clean target distribution from no...
research
02/11/2021

Speech enhancement with mixture-of-deep-experts with clean clustering pre-training

In this study we present a mixture of deep experts (MoDE) neural-network...
research
02/15/2018

Deep Learning Based Speech Beamforming

Multi-channel speech enhancement with ad-hoc sensors has been a challeng...

Please sign up or login with your details

Forgot password? Click here to reset