Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System

10/03/2019
by   Kai Fan, et al.
0

The performances of automatic speech recognition (ASR) systems are usually evaluated by the metric word error rate (WER) when the manually transcribed data are provided, which are, however, expensively available in the real scenario. In addition, the empirical distribution of WER for most ASR systems usually tends to put a significant mass near zero, making it difficult to simulate with a single continuous distribution. In order to address the two issues of ASR quality estimation (QE), we propose a novel neural zero-inflated model to predict the WER of the ASR result without transcripts. We design a neural zero-inflated beta regression on top of a bidirectional transformer language model conditional on speech features (speech-BERT). We adopt the pre-training strategy of token level mask language modeling for speech-BERT as well, and further fine-tune with our zero-inflated layer for the mixture of discrete and continuous outputs. The experimental results show that our approach achieves better performance on WER prediction in the metrics of Pearson and MAE, compared with most existed quality estimation algorithms for ASR or machine translation.

READ FULL TEXT
research
02/22/2022

Korean Tokenization for Beam Search Rescoring in Speech Recognition

The performance of automatic speech recognition (ASR) models can be grea...
research
06/02/2021

Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights

Automatic speech recognition (ASR) in Sanskrit is interesting, owing to ...
research
10/05/2021

ASR Rescoring and Confidence Estimation with ELECTRA

In automatic speech recognition (ASR) rescoring, the hypothesis with the...
research
01/14/2021

WER-BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm

Automatic Speech Recognition (ASR) systems are evaluated using Word Erro...
research
07/16/2022

Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

We present an approach to reduce the performance disparity between geogr...
research
12/22/2022

Alignment Entropy Regularization

Existing training criteria in automatic speech recognition(ASR) permit t...
research
09/19/2019

A Random Gossip BMUF Process for Neural Language Modeling

LSTM language model is an essential component of industrial ASR systems....

Please sign up or login with your details

Forgot password? Click here to reset