Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition

07/09/2022
by   Yizhou Peng, et al.
0

Internal Language Model Estimation (ILME) based language model (LM) fusion has been shown significantly improved recognition results over conventional shallow fusion in both intra-domain and cross-domain speech recognition tasks. In this paper, we attempt to apply our ILME method to cross-domain code-switching speech recognition (CSSR) work. Specifically, our curiosity comes from several aspects. First, we are curious about how effective the ILME-based LM fusion is for both intra-domain and cross-domain CSSR tasks. We verify this with or without merging two code-switching domains. More importantly, we train an end-to-end (E2E) speech recognition model by means of merging two monolingual data sets and observe the efficacy of the proposed ILME-based LM fusion for CSSR. Experimental results on SEAME that is from Southeast Asian and another Chinese Mainland CS data set demonstrate the effectiveness of the proposed ILME-based LM fusion method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2018

Towards End-to-end Automatic Code-Switching Speech Recognition

Speech recognition in mixed language has difficulties to adapt end-to-en...
research
06/15/2022

Residual Language Model for End-to-end Speech Recognition

End-to-end automatic speech recognition suffers from adaptation to unkno...
research
11/09/2022

Multimodal Dyadic Impression Recognition via Listener Adaptive Cross-Domain Fusion

As a sub-branch of affective computing, impression recognition, e.g., pe...
research
03/31/2022

An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Utilizing text-only data with an external language model (LM) in end-to-...
research
10/26/2022

Pronunciation Generation for Foreign Language Words in Intra-Sentential Code-Switching Speech Recognition

Code-Switching refers to the phenomenon of switching languages within a ...
research
10/13/2021

On Language Model Integration for RNN Transducer based Speech Recognition

The mismatch between an external language model (LM) and the implicitly ...
research
03/25/2022

A Cross-Domain Approach for Continuous Impression Recognition from Dyadic Audio-Visual-Physio Signals

The impression we make on others depends not only on what we say, but al...

Please sign up or login with your details

Forgot password? Click here to reset