Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

11/30/2020
by   Vijay Ravi, et al.
0

End-to-end automatic speech recognition (ASR) systems, such as recurrent neural network transducer (RNN-T), have become popular, but rare word remains a challenge. In this paper, we propose a simple, yet effective method called unigram shallow fusion (USF) to improve rare words for RNN-T. In USF, we extract rare words from RNN-T training data based on unigram count, and apply a fixed reward when the word is encountered during decoding. We show that this simple method can improve performance on rare words by 3.7 without degradation on general test set, and the improvement from USF is additive to any additional language model based rescoring. Then, we show that the same USF does not work on conventional hybrid system. Finally, we reason that USF works by fixing errors in probability estimates of words due to Viterbi search used during decoding with subword-based RNN-T.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2020

Deep Shallow Fusion for RNN-T Personalization

End-to-end models in general, and Recurrent Neural Network Transducer (R...
research
11/23/2020

Multi-task Language Modeling for Improving Speech Recognition of Rare Words

End-to-end automatic speech recognition (ASR) systems are increasingly p...
research
02/21/2022

Adaptive Discounting of Implicit Language Models in RNN-Transducers

RNN-Transducer (RNN-T) models have become synonymous with streaming end-...
research
01/10/2022

A Likelihood Ratio based Domain Adaptation Method for E2E Models

End-to-end (E2E) automatic speech recognition models like Recurrent Neur...
research
04/20/2022

Detecting Unintended Memorization in Language-Model-Fused ASR

End-to-end (E2E) models are often being accompanied by language models (...
research
04/15/2022

Improving Rare Word Recognition with LM-aware MWER Training

Language models (LMs) significantly improve the recognition accuracy of ...
research
03/09/2022

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

Language model fusion helps smart assistants recognize words which are r...

Please sign up or login with your details

Forgot password? Click here to reset