Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

05/31/2023
by   Shuyue Stella Li, et al.
0

Code-switching, also called code-mixing, is the linguistics phenomenon where in casual settings, multilingual speakers mix words from different languages in one utterance. Due to its spontaneous nature, code-switching is extremely low-resource, which makes it a challenging problem for language and speech processing tasks. In such contexts, Code-Switching Language Identification (CSLID) becomes a difficult but necessary task if we want to maximally leverage existing monolingual tools for other tasks. In this work, we propose two novel approaches toward improving language identification accuracy on an English-Mandarin child-directed speech dataset. Our methods include a stacked Residual CNN+GRU model and a multitask pre-training approach to use Automatic Speech Recognition (ASR) as an auxiliary task for CSLID. Due to the low-resource nature of code-switching, we also employ careful silver data creation using monolingual corpora in both languages and up-sampling as data augmentation. We focus on English-Mandarin code-switched data, but our method works on any language pair. Our best model achieves a balanced accuracy of 0.781 on a real English-Mandarin code-switching child-directed speech corpus and outperforms the previous baseline by 55.3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2021

Multilingual and code-switching ASR challenges for low resource Indian languages

Recently, there is increasing interest in multilingual automatic speech ...
research
09/11/2019

From English to Code-Switching: Transfer Learning with Strong Morphological Clues

Code-switching is still an understudied phenomenon in natural language p...
research
04/29/2020

Meta-Transfer Learning for Code-Switched Speech Recognition

An increasing number of people in the world today speak a mixed-language...
research
04/16/2018

Universal Dependency Parsing for Hindi-English Code-switching

Code-switching is a phenomenon of mixing grammatical structures of two o...
research
06/13/2019

Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

Multilingual writers and speakers often alternate between two languages ...
research
11/30/2020

Transformer-Transducers for Code-Switched Speech Recognition

We live in a world where 60 languages fluently. Members of these communi...
research
07/15/2019

Joint Language Identification of Code-Switching Speech using Attention based E2E Network

Language identification (LID) has relevance in many speech processing ap...

Please sign up or login with your details

Forgot password? Click here to reset