Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

Automatic speech recognition (ASR) systems lack joint optimization during decoding over the acoustic, lexical and language models; for instance the ASR will often prune words due to acoustics using short-term context, prior to rescoring with long-term context. In this work we model the automated speech transcription process as a noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR. The proposed system can exploit long-term context using a neural network language model and can better choose between existing ASR output possibilities as well as re-introduce previously pruned and unseen (out-of-vocabulary) phrases. The system provides significant corrections under poorly performing ASR conditions without degrading any accurate transcriptions. The proposed system can thus be independently optimized and post-process the output of even a highly optimized ASR. We show that the system consistently provides improvements over the baseline ASR. We also show that it performs better when used on out-of-domain and mismatched test data and under high-error ASR conditions. Finally, an extensive analysis of the type of errors corrected by our system is presented.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2020

Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

Automatic speech recognition in reverberant conditions is a challenging ...
research
03/01/2023

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

Error correction models form an important part of Automatic Speech Recog...
research
02/23/2021

Evolutionary optimization of contexts for phonetic correction in speech recognition systems

Automatic Speech Recognition (ASR) is an area of growing academic and co...
research
11/23/2021

Romanian Speech Recognition Experiments from the ROBIN Project

One of the fundamental functionalities for accepting a socially assistiv...
research
05/16/2019

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

The rapid population aging has stimulated the development of assistive d...
research
07/13/2023

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study

This paper explores the integration of Large Language Models (LLMs) into...
research
03/03/2020

Improving Uyghur ASR systems with decoders using morpheme-based language models

Uyghur is a minority language, and its resources for Automatic Speech Re...

Please sign up or login with your details

Forgot password? Click here to reset