TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

02/16/2023
by   Yunyang Zeng, et al.
0

Speech enhancement models have greatly progressed in recent years, but still show limits in perceptual quality of their speech outputs. We propose an objective for perceptual quality based on temporal acoustic parameters. These are fundamental speech features that play an essential role in various applications, including speaker recognition and paralinguistic analysis. We provide a differentiable estimator for four categories of low-level acoustic descriptors involving: frequency-related parameters, energy or amplitude-related parameters, spectral balance parameters, and temporal features. Unlike prior work that looks at aggregated acoustic parameters or a few categories of acoustic parameters, our temporal acoustic parameter (TAP) loss enables auxiliary optimization and improvement of many fine-grain speech characteristics in enhancement workflows. We show that adding TAPLoss as an auxiliary objective in speech enhancement produces speech with improved perceptual quality and intelligibility. We use data from the Deep Noise Suppression 2020 Challenge to demonstrate that both time-domain models and time-frequency domain models can benefit from our method.

READ FULL TEXT
research
02/16/2023

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Despite rapid advancement in recent years, current speech enhancement mo...
research
07/01/2022

Improving Speech Enhancement through Fine-Grained Speech Characteristics

While deep learning based speech enhancement systems have made rapid pro...
research
02/10/2022

Single-channel speech enhancement by using psychoacoustical model inspired fusion framework

When the parameters of Bayesian Short-time Spectral Amplitude (STSA) est...
research
01/31/2021

High Fidelity Speech Regeneration with Application to Speech Enhancement

Speech enhancement has seen great improvement in recent years mainly thr...
research
01/29/2021

Acoustic Structure Inverse Design and Optimization Using Deep Learning

From ancient to modern times, acoustic structures have been used to cont...
research
01/14/2023

Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope

The syllable is a perceptually salient unit in speech. Since both the sy...
research
03/14/2023

TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge

This paper introduces the Unbeatable Team's submission to the ICASSP 202...

Please sign up or login with your details

Forgot password? Click here to reset