On Psychoacoustically Weighted Cost Functions Towards Resource-Efficient Deep Neural Networks for Speech Denoising

01/29/2018
by   Kai Zhen, et al.
0

We present a psychoacoustically enhanced cost function to balance network complexity and perceptual performance of deep neural networks for speech denoising. While training the network, we utilize perceptual weights added to the ordinary mean-squared error to emphasize contribution from frequency bins which are most audible while ignoring error from inaudible bins. To generate the weights, we employ psychoacoustic models to compute the global masking threshold from the clean speech spectra. We then evaluate the speech denoising performance of our perceptually guided neural network by using both objective and perceptual sound quality metrics, testing on various network structures ranging from shallow and narrow ones to deep and wide ones. The experimental results showcase our method as a valid approach for infusing perceptual significance to deep neural network operations. In particular, the more perceptually sensible enhancement in performance seen by simple neural network topologies proves that the proposed method can lead to resource-efficient speech denoising implementations in small devices without degrading the perceived signal fidelity.

READ FULL TEXT

page 3

page 4

research
03/16/2019

Non-intrusive speech quality assessment using neural networks

Estimating the perceived quality of an audio signal is critical for many...
research
11/06/2021

Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet

Speech enhancement employing deep neural networks (DNNs) for denoising a...
research
01/26/2019

End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

Supervised learning based on a deep neural network recently has achieved...
research
09/14/2023

Complexity Scaling for Speech Denoising

Computational complexity is critical when deploying deep learning-based ...
research
02/25/2022

Deep Neural Network for Automatic Assessment of Dysphonia

The purpose of this work is to contribute to the understanding and impro...
research
09/10/2019

PTRM: Perceived Terrain Realism Metrics

Terrains are visually important and commonly used in computer graphics. ...
research
10/16/2021

Controllable Multichannel Speech Dereverberation based on Deep Neural Networks

Neural network based speech dereverberation has achieved promising resul...

Please sign up or login with your details

Forgot password? Click here to reset