PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

02/16/2023
by   Muqiao Yang, et al.
0

Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. We identify temporal acoustic parameters – such as spectral tilt, spectral flux, shimmer, etc. – that are non-differentiable, and we develop a neural network estimator that can accurately predict their time-series values across an utterance. We also model phoneme-specific weights for each feature, as the acoustic parameters are known to show different behavior in different phonemes. We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features. Experimentally we show that it improves speech enhancement workflows in both time-domain and time-frequency domain, as measured by standard evaluation metrics. We also provide an analysis of phoneme-dependent improvement on acoustic parameters, demonstrating the additional interpretability that our method provides. This analysis can suggest which features are currently the bottleneck for improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2023

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

Speech enhancement models have greatly progressed in recent years, but s...
research
07/23/2021

Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model

We propose a multi-channel speech enhancement approach with a novel two-...
research
07/01/2022

Improving Speech Enhancement through Fine-Grained Speech Characteristics

While deep learning based speech enhancement systems have made rapid pro...
research
02/10/2022

Single-channel speech enhancement by using psychoacoustical model inspired fusion framework

When the parameters of Bayesian Short-time Spectral Amplitude (STSA) est...
research
01/31/2021

High Fidelity Speech Regeneration with Application to Speech Enhancement

Speech enhancement has seen great improvement in recent years mainly thr...
research
03/26/2018

Spectral feature mapping with mimic loss for robust speech recognition

For the task of speech enhancement, local learning objectives are agnost...
research
08/11/2020

PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

Neural network applications generally benefit from larger-sized models, ...

Please sign up or login with your details

Forgot password? Click here to reset