Deep Shallow Fusion for RNN-T Personalization

11/16/2020
by   Duc Le, et al.
0

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks. However, these models are more challenging to personalize compared to traditional hybrid systems due to the lack of external language models and difficulties in recognizing rare long-tail words, specifically entity names. In this work, we present novel techniques to improve RNN-T's ability to model rare WordPieces, infuse extra information into the encoder, enable the use of alternative graphemic pronunciations, and perform deep fusion with personalized language models for more robust biasing. We show that these combined techniques result in 15.4 baseline which uses shallow fusion and text-to-speech augmentation. Our work helps push the boundary of RNN-T personalization and close the gap with hybrid systems on use cases where biasing and entity recognition are crucial.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2020

Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

End-to-end automatic speech recognition (ASR) systems, such as recurrent...
research
04/05/2021

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

How to leverage dynamic contextual information in end-to-end speech reco...
research
03/30/2023

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

End-to-End (E2E) automatic speech recognition (ASR) systems used in voic...
research
02/15/2021

Personalization Strategies for End-to-End Speech Recognition Systems

The recognition of personalized content, such as contact names, remains ...
research
04/15/2022

Improving Rare Word Recognition with LM-aware MWER Training

Language models (LMs) significantly improve the recognition accuracy of ...
research
04/20/2022

Detecting Unintended Memorization in Language-Model-Fused ASR

End-to-end (E2E) models are often being accompanied by language models (...
research
03/31/2022

An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Utilizing text-only data with an external language model (LM) in end-to-...

Please sign up or login with your details

Forgot password? Click here to reset