Hallucination of speech recognition errors with sequence to sequence learning

03/23/2021
by   Prashant Serai, et al.
0

Automatic Speech Recognition (ASR) is an imperfect process that results in certain mismatches in ASR output text when compared to plain written text or transcriptions. When plain text data is to be used to train systems for spoken language understanding or ASR, a proven strategy to reduce said mismatch and prevent degradations, is to hallucinate what the ASR outputs would be given a gold transcription. Prior work in this domain has focused on modeling errors at the phonetic level, while using a lexicon to convert the phones to words, usually accompanied by an FST Language model. We present novel end-to-end models to directly predict hallucinated ASR word sequence outputs, conditioning on an input word sequence as well as a corresponding phoneme sequence. This improves prior published results for recall of errors from an in-domain ASR system's transcription of unseen data, as well as an out-of-domain ASR system's transcriptions of audio from an unrelated task, while additionally exploring an in-between scenario when limited characterization data from the test ASR system is obtainable. To verify the extrinsic validity of the method, we also use our hallucinated ASR errors to augment training for a spoken question classifier, finding that they enable robustness to real ASR errors in a downstream task, when scarce or even zero task-specific audio was available at train-time.

READ FULL TEXT

page 1

page 6

research
12/15/2020

Exploring Transfer Learning For End-to-End Spoken Language Understanding

Voice Assistants such as Alexa, Siri, and Google Assistant typically use...
research
02/02/2022

Error Correction in ASR using Sequence-to-Sequence Models

Post-editing in Automatic Speech Recognition (ASR) entails automatically...
research
12/16/2022

Speech Aware Dialog System Technology Challenge (DSTC11)

Most research on task oriented dialog modeling is based on written text ...
research
04/13/2020

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

Automatic Speech Recognition (ASR) systems introduce word errors, which ...
research
12/06/2019

Audio-attention discriminative language model for ASR rescoring

End-to-end approaches for automatic speech recognition (ASR) benefit fro...
research
09/22/2017

Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model

We apply sequence-to-sequence model to mitigate the impact of speech rec...
research
10/16/2019

Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

The transcriptions used to train an Automatic Speech Recognition (ASR) s...

Please sign up or login with your details

Forgot password? Click here to reset