Code-Switching Text Generation and Injection in Mandarin-English ASR

03/20/2023
by   Haibin Yu, et al.
0

Code-switching speech refers to a means of expression by mixing two or more languages within a single utterance. Automatic Speech Recognition (ASR) with End-to-End (E2E) modeling for such speech can be a challenging task due to the lack of data. In this study, we investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T), in Mandarin-English code-switching speech recognition. We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces. Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models, i.e., 16 reduction averaged on three evaluation sets, and the approach of tying speech and text latent spaces is superior to that of TTS conversion on the evaluation set which contains more homogeneous data with the training set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation

Code-switching describes the practice of using more than one language in...
research
11/04/2020

Data Augmentation for End-to-end Code-switching Speech Recognition

Training a code-switching end-to-end automatic speech recognition (ASR) ...
research
07/04/2021

Arabic Code-Switching Speech Recognition using Monolingual Data

Code-switching in automatic speech recognition (ASR) is an important cha...
research
10/20/2022

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

Current end-to-end code-switching Text-to-Speech (TTS) can already gener...
research
08/14/2023

Using Text Injection to Improve Recognition of Personal Identifiers in Speech

Accurate recognition of specific categories, such as persons' names, dat...
research
08/23/2021

CGEMs: A Metric Model for Automatic Code Generation using GPT-3

Today, AI technology is showing its strengths in almost every industry a...
research
07/24/2023

Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN

Call Centers have huge amount of audio data which can be used for achiev...

Please sign up or login with your details

Forgot password? Click here to reset