Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

02/14/2020
by   Masaki Kawanaka, et al.
0

Improving subjective sound quality of enhanced signals is one of the most important missions in speech enhancement. For evaluating the subjective quality, several methods related to perceptually-motivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality). However, direct use of such measures for training deep neural network (DNN) is not allowed in most cases because popular OSQAs are non-differentiable with respect to DNN parameters. Therefore, the previous study has proposed to approximate the score of OSQAs by an auxiliary DNN so that its gradient can be used for training the primary DNN. One problem with this approach is instability of the training caused by the approximation error of the score. To overcome this problem, we propose to use stabilization techniques borrowed from reinforcement learning. The experiments, aimed to increase the score of PESQ as an example, show that the proposed method (i) can stably train a DNN to increase PESQ, (ii) achieved the state-of-the-art PESQ score on a public dataset, and (iii) resulted in better sound quality than conventional methods based on subjective evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2018

DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score

We propose a training method for deep neural network (DNN)-based source ...
research
04/08/2021

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

The discrepancy between the cost function used for training a speech enh...
research
07/22/2021

Controlling the Perceived Sound Quality for Dialogue Enhancement with Deep Learning

Speech enhancement attenuates interfering sounds in speech signals but m...
research
07/01/2022

Improving Speech Enhancement through Fine-Grained Speech Characteristics

While deep learning based speech enhancement systems have made rapid pro...
research
02/02/2018

Monaural Speech Enhancement using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure

In this paper we propose a Deep Neural Network (DNN) based Speech Enhanc...
research
02/03/2020

Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network

We propose a tensor-to-vector regression approach to multi-channel speec...
research
03/21/2019

Data-driven design of perfect reconstruction filterbank for DNN-based sound source enhancement

We propose a data-driven design method of perfect-reconstruction filterb...

Please sign up or login with your details

Forgot password? Click here to reset