An Empirical Study of Language Model Integration for Transducer based Speech Recognition

03/31/2022
by   Huahuan Zheng, et al.
0

Utilizing text-only data with an external language model (LM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and ILM estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract the implicitly learned ILM prior, in order to integrate the external LM. While recent studies suggest that RNN-T only learns some low-order language model information, the DR method uses a well-trained ILM. We hypothesize that this setting is appropriate and may deteriorate the performance of the DR method, and propose a low-order density ratio method (LODR) by training a low-order weak ILM for DR. Extensive empirical experiments are conducted on both in-domain and cross-domain scenarios on English LibriSpeech Tedlium-2 and Chinese WenetSpeech AISHELL-1 datasets. It is shown that LODR consistently outperforms SF in all tasks, while performing generally close to ILME and better than DR in most tests.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2020

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

This article describes a density ratio approach to integrating external ...
research
10/13/2021

On Language Model Integration for RNN Transducer based Speech Recognition

The mismatch between an external language model (LM) and the implicitly ...
research
07/09/2022

Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition

Internal Language Model Estimation (ILME) based language model (LM) fusi...
research
06/15/2022

Residual Language Model for End-to-end Speech Recognition

End-to-end automatic speech recognition suffers from adaptation to unkno...
research
11/28/2019

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transd...
research
03/17/2021

Advancing RNN Transducer Technology for Speech Recognition

We investigate a set of techniques for RNN Transducers (RNN-Ts) that wer...
research
11/16/2020

Deep Shallow Fusion for RNN-T Personalization

End-to-end models in general, and Recurrent Neural Network Transducer (R...

Please sign up or login with your details

Forgot password? Click here to reset