Predicting word error rate for reverberant speech

11/01/2019
by   Hannes Gamper, et al.
0

Reverberation negatively impacts the performance of automatic speech recognition (ASR). Prior work on quantifying the effect of reverberation has shown that clarity (C50), a parameter that can be estimated from the acoustic impulse response, is correlated with ASR performance. In this paper we propose predicting ASR performance in terms of the word error rate (WER) directly from acoustic parameters via a polynomial, sigmoidal, or neural network fit, as well as blindly from reverberant speech samples using a convolutional neural network (CNN). We carry out experiments on two state-of-the-art ASR models and a large set of acoustic impulse responses (AIRs). The results confirm C50 and C80 to be highly correlated with WER, allowing WER to be predicted with the proposed fitting approaches. The proposed non-intrusive CNN model outperforms C50-based WER prediction, indicating that WER can be estimated blindly, i.e., directly from the reverberant speech samples without knowledge of the acoustic parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

Automatic Speech recognition for Speech Assessment of Preschool Children

The acoustic and linguistic features of preschool speech are investigate...
research
07/16/2022

Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

We present an approach to reduce the performance disparity between geogr...
research
08/08/2020

Word Error Rate Estimation Without ASR Output: e-WER2

Measuring the performance of automatic speech recognition (ASR) systems ...
research
11/08/2022

Towards Improved Room Impulse Response Estimation for Speech Recognition

We propose to characterize and improve the performance of blind room imp...
research
05/21/2020

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

In this paper we present state-of-the-art (SOTA) performance on the Libr...
research
04/26/2022

Mask scalar prediction for improving robust automatic speech recognition

Using neural network based acoustic frontends for improving robustness o...
research
02/11/2020

CGCNN: Complex Gabor Convolutional Neural Network on raw speech

Convolutional Neural Networks (CNN) have been used in Automatic Speech R...

Please sign up or login with your details

Forgot password? Click here to reset