A Likelihood Ratio based Domain Adaptation Method for E2E Models

01/10/2022
by   Chhavi Choudhury, et al.
0

End-to-end (E2E) automatic speech recognition models like Recurrent Neural Networks Transducer (RNN-T) are becoming a popular choice for streaming ASR applications like voice assistants. While E2E models are very effective at learning representation of the training data they are trained on, their accuracy on unseen domains remains a challenging problem. Additionally, these models require paired audio and text training data, are computationally expensive and are difficult to adapt towards the fast evolving nature of conversational speech. In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities. We show that this method is effective in improving rare words recognition, and results in a relative improvement of 10 word error rate (WER) and 10 out-of-domain datasets without any degradation on a general dataset. We also show that complementing the contextual biasing adaptation with adaptation of a second-pass rescoring model gives additive WER improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2020

Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

End-to-end automatic speech recognition (ASR) systems, such as recurrent...
research
02/26/2020

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

This article describes a density ratio approach to integrating external ...
research
07/30/2020

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

Because of its streaming nature, recurrent neural network transducer (RN...
research
11/07/2020

Naturalization of Text by the Insertion of Pauses and Filler Words

In this article, we introduce a set of methods to naturalize text based ...
research
06/04/2020

Contextual RNN-T For Open Domain ASR

End-to-end (E2E) systems for automatic speech recognition (ASR), such as...
research
01/05/2021

Domain-aware Neural Language Models for Speech Recognition

As voice assistants become more ubiquitous, they are increasingly expect...
research
06/24/2019

Streaming Adaptation of Deep Forecasting Models using Adaptive Recurrent Units

We present ARU, an Adaptive Recurrent Unit for streaming adaptation of d...

Please sign up or login with your details

Forgot password? Click here to reset