Naaloss: Rethinking the objective of speech enhancement

08/24/2023
by   Kuan-Hsun Ho, et al.
0

Reducing noise interference is crucial for automatic speech recognition (ASR) in a real-world scenario. However, most single-channel speech enhancement (SE) generates "processing artifacts" that negatively affect ASR performance. Hence, in this study, we suggest a Noise- and Artifacts-aware loss function, NAaLoss, to ameliorate the influence of artifacts from a novel perspective. NAaLoss considers the loss of estimation, de-artifact, and noise ignorance, enabling the learned SE to individually model speech, artifacts, and noise. We examine two SE models (simple/advanced) learned with NAaLoss under various input scenarios (clean/noisy) using two configurations of the ASR system (with/without noise robustness). Experiments reveal that NAaLoss significantly improves the ASR performance of most setups while preserving the quality of SE toward perception and intelligibility. Furthermore, we visualize artifacts through waveforms and spectrograms, and explain their impact on ASR.

READ FULL TEXT
research
11/01/2021

SNRi Target Training for Joint Speech Enhancement and Recognition

This study aims to improve the performance of automatic speech recogniti...
research
01/18/2022

How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

It is challenging to improve automatic speech recognition (ASR) performa...
research
05/26/2021

Training Speech Enhancement Systems with Noisy Speech Datasets

Recently, deep neural network (DNN)-based speech enhancement (SE) system...
research
11/23/2021

Effect of noise suppression losses on speech distortion and ASR performance

Deep learning based speech enhancement has made rapid development toward...
research
07/04/2021

TENET: A Time-reversal Enhancement Network for Noise-robust ASR

Due to the unprecedented breakthroughs brought about by deep learning, s...
research
08/27/2021

Task-aware Warping Factors in Mask-based Speech Enhancement

This paper proposes the use of two task-aware warping factors in mask-ba...
research
01/11/2022

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition

The combination of a deep neural network (DNN) -based speech enhancement...

Please sign up or login with your details

Forgot password? Click here to reset