Code-Switching Text Augmentation for Multilingual Speech Processing

01/07/2022
by   Amir Hussein, et al.
10

The pervasiveness of intra-utterance Code-switching (CS) in spoken content has enforced ASR systems to handle mixed input. Yet, designing a CS-ASR has many challenges, mainly due to the data scarcity, grammatical structure complexity, and mismatch along with unbalanced language usage distribution. Recent ASR studies showed the predominance of E2E-ASR using multilingual data to handle CS phenomena with little CS data. However, the dependency on the CS data still remains. In this work, we propose a methodology to augment the monolingual data for artificially generating spoken CS text to improve different speech modules. We based our approach on Equivalence Constraint theory while exploiting aligned translation pairs, to generate grammatically valid CS content. Our empirical results show a relative gain of 29-34 perplexity and around 2 Finally, the human evaluation suggests that 83.8 acceptable to humans.

READ FULL TEXT
research
07/28/2018

Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech

In this paper, we describe several techniques for improving the acoustic...
research
06/18/2019

Multi-Graph Decoding for Code-Switching ASR

In the FAME! Project, a code-switching (CS) automatic speech recognition...
research
10/17/2022

Language-agnostic Code-Switching in End-To-End Speech Recognition

Code-Switching (CS) is referred to the phenomenon of alternately using w...
research
06/19/2019

Code-Switching Detection Using ASR-Generated Language Posteriors

Code-switching (CS) detection refers to the automatic detection of langu...
research
07/12/2020

The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results

Code-switching (CS) is a common phenomenon and recognizing CS speech is ...
research
12/14/2016

Grammatical Constraints on Intra-sentential Code-Switching: From Theories to Working Models

We make one of the first attempts to build working models for intra-sent...
research
07/31/2022

The Who in Code-Switching: A Case Study for Predicting Egyptian Arabic-English Code-Switching Levels based on Character Profiles

Code-switching (CS) is a common linguistic phenomenon exhibited by multi...

Please sign up or login with your details

Forgot password? Click here to reset