How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

01/18/2022
by   Kazuma Iwamoto, et al.
0

It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources. We propose manually scaling the error components to analyze their impact on ASR. We experimentally identify the artifact component as the main cause of performance degradation, and we find that mitigating the artifact can greatly improve ASR performance. Furthermore, we demonstrate that the simple observation adding (OA) technique (i.e., adding a scaled version of the observed signal to the enhanced speech) can monotonically increase the signal-to-artifact ratio under a mild condition. Accordingly, we experimentally confirm that OA improves ASR performance for both simulated and real recordings. The findings of this paper provide a better understanding of the influence of SE errors on ASR and open the door to future research on novel approaches for designing effective single-channel SE front-ends for ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2021

SNRi Target Training for Joint Speech Enhancement and Recognition

This study aims to improve the performance of automatic speech recogniti...
research
08/24/2023

Naaloss: Rethinking the objective of speech enhancement

Reducing noise interference is crucial for automatic speech recognition ...
research
06/02/2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

Although recent advances in deep learning technology improved automatic ...
research
01/11/2022

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition

The combination of a deep neural network (DNN) -based speech enhancement...
research
07/19/2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

This paper presents recent progress on integrating speech separation and...
research
07/04/2021

TENET: A Time-reversal Enhancement Network for Noise-robust ASR

Due to the unprecedented breakthroughs brought about by deep learning, s...
research
07/13/2017

Predicting Causes of Reformulation in Intelligent Assistants

Intelligent assistants (IAs) such as Siri and Cortana conversationally i...

Please sign up or login with your details

Forgot password? Click here to reset