Pronunciation Generation for Foreign Language Words in Intra-Sentential Code-Switching Speech Recognition

10/26/2022
by   Wei Wang, et al.
0

Code-Switching refers to the phenomenon of switching languages within a sentence or discourse. However, limited code-switching , different language phoneme-sets and high rebuilding costs throw a challenge to make the specialized acoustic model for code-switching speech recognition. In this paper, we make use of limited code-switching data as driving materials and explore a shortcut to quickly develop intra-sentential code-switching recognition skill on the commissioned native language acoustic model, where we propose a data-driven method to make the seed lexicon which is used to train grapheme-to-phoneme model to predict mapping pronunciations for foreign language word in code-switching sentences. The core work of the data-driven technology in this paper consists of a phonetic decoding method and different selection methods. And for imbalanced word-level driving materials problem, we have an internal assistance inspiration that learning the good pronunciation rules in the words that possess sufficient materials using the grapheme-to-phoneme model to help the scarce. Our experiments show that the Mixed Error Rate in intra-sentential Chinese-English code-switching recognition reduced from 29.15%, acquired on the pure Chinese recognizer, to 12.13% by adding foreign language words' pronunciation through our data-driven approach, and finally get the best result 11.14% with the combination of different selection methods and internal assistance tactic.

READ FULL TEXT
research
10/30/2018

Towards End-to-end Automatic Code-Switching Speech Recognition

Speech recognition in mixed language has difficulties to adapt end-to-en...
research
10/12/2022

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

Code-switching automatic speech recognition becomes one of the most chal...
research
07/09/2022

Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition

Internal Language Model Estimation (ILME) based language model (LM) fusi...
research
06/29/2022

Language-specific Characteristic Assistance for Code-switching Speech Recognition

Dual-encoder structure successfully utilizes two language-specific encod...
research
08/21/2022

The Development of a Labelled te reo Māori-English Bilingual Database for Language Technology

Te reo Māori (referred to as Māori), New Zealand's indigenous language, ...
research
09/18/2019

Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences

Training code-switched language models is difficult due to lack of data ...
research
12/14/2016

Grammatical Constraints on Intra-sentential Code-Switching: From Theories to Working Models

We make one of the first attempts to build working models for intra-sent...

Please sign up or login with your details

Forgot password? Click here to reset