Improving Proper Noun Recognition in End-to-End ASR By Customization of the MWER Loss Criterion

by   Cal Peyser, et al.

Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems in that a particular name may appear only rarely during training, and may have a pronunciation similar to that of a more common word. Unlike conventional ASR models, E2E systems lack an explicit pronounciation model that can be specifically trained with proper noun pronounciations and a language model that can be trained on a large text-only corpus. Past work has addressed this issue by incorporating additional training data or additional models. In this paper, we instead build on recent advances in minimum word error rate (MWER) training to develop two new loss criteria that specifically emphasize proper noun recognition. Unlike past work on this problem, this method requires no new data during training or external models during inference. We see improvements ranging from 2 relevant benchmarks.


page 1

page 2

page 3

page 4


Cycle-consistency training for end-to-end speech recognition

This paper presents a method to train end-to-end automatic speech recogn...

Training Neural Speech Recognition Systems with Synthetic Speech Augmentation

Building an accurate automatic speech recognition (ASR) system requires ...

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

End-to-end (E2E) automatic speech recognition (ASR) systems lack the dis...

Contextual Speech Recognition with Difficult Negative Training Examples

Improving the representation of contextual information is key to unlocki...

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec

"Transcription bottlenecks", created by a shortage of effective human tr...

USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder

Improving end-to-end speech recognition by incorporating external text d...

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

As end-to-end automatic speech recognition (ASR) models reach promising ...