Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation

02/22/2023
by   Xiaoqiang Wang, et al.
1

We previously proposed contextual spelling correction (CSC) to correct the output of end-to-end (E2E) automatic speech recognition (ASR) models with contextual information such as name, place, etc. Although CSC has achieved reasonable improvement in the biasing problem, there are still two drawbacks for further accuracy improvement. First, due to information limitation in text only hypothesis or weak performance of ASR model on rare domains, the CSC model may fail to correct phrases with similar pronunciation or anti-context cases where all biasing phrases are not present in the utterance. Second, there is a discrepancy between the training and inference of CSC. The bias list in training is randomly selected but in inference there may be more similarity between ground truth phrase and other phrases. To solve above limitations, in this paper we propose an improved non-autoregressive (NAR) spelling correction model for contextual biasing in E2E neural transducer-based ASR systems to improve the previous CSC model from two perspectives: Firstly, we incorporate acoustics information with an external attention as well as text hypotheses into CSC to better distinguish target phrase from dissimilar or irrelevant phrases. Secondly, we design a semantic aware data augmentation schema in training phrase to reduce the mismatch between training and inference to further boost the biasing accuracy. Experiments show that the improved method outperforms the baseline ASR+Biasing system by as much as 20.3 recall gain and achieves stable improvement compared to the previous CSC method over different bias list name coverage ratio.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2022

End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

End-to-end (E2E) speech recognition architectures assemble all component...
research
03/02/2022

Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

Contextual biasing is an important and challenging task for end-to-end a...
research
04/18/2023

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

This paper presents an extension to train end-to-end Context-Aware Trans...
research
05/09/2023

Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition

Attention-based contextual biasing approaches have shown significant imp...
research
08/07/2023

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

Hotword customization is one of the important issues remained in ASR fie...
research
06/04/2023

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings

Contextual spelling correction models are an alternative to shallow fusi...
research
08/02/2021

User-Initiated Repetition-Based Recovery in Multi-Utterance Dialogue Systems

Recognition errors are common in human communication. Similar errors oft...

Please sign up or login with your details

Forgot password? Click here to reset