Code-Switched Text Synthesis in Unseen Language Pairs

05/26/2023
by   I-Hung Hsu, et al.
0

Existing efforts on text synthesis for code-switching mostly require training on code-switched texts in the target language pairs, limiting the deployment of the models to cases lacking code-switched data. In this work, we study the problem of synthesizing code-switched texts for language pairs absent from the training data. We introduce GLOSS, a model built on top of a pre-trained multilingual machine translation model (PMMTM) with an additional code-switching module. This module, either an adapter or extra prefixes, learns code-switching patterns from code-switched data during training, while the primary component of GLOSS, i.e., the PMMTM, is frozen. The design of only adjusting the code-switching module prevents our model from overfitting to the constrained training data for code-switching. Hence, GLOSS exhibits the ability to generalize and synthesize code-switched texts across a broader spectrum of language pairs. Additionally, we develop a self-training algorithm on target language pairs further to enhance the reliability of GLOSS. Automatic evaluations on four language pairs show that GLOSS achieves at least 55 relative BLEU and METEOR scores improvements compared to strong baselines. Human evaluations on two language pairs further validate the success of GLOSS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2021

Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

This paper presents our latest effort on improving Code-switching langua...
research
05/11/2021

Can You Traducir This? Machine Translation for Code-Switched Input

Code-Switching (CSW) is a common phenomenon that occurs in multilingual ...
research
11/04/2020

Data Augmentation for End-to-end Code-switching Speech Recognition

Training a code-switching end-to-end automatic speech recognition (ASR) ...
research
10/11/2022

Checks and Strategies for Enabling Code-Switched Machine Translation

Code-switching is a common phenomenon among multilingual speakers, where...
research
08/06/2020

Phonological Features for 0-shot Multilingual Speech Synthesis

Code-switching—the intra-utterance use of multiple languages—is prevalen...
research
05/18/2021

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

We describe models focused at the understudied problem of translating be...
research
11/01/2021

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Code-switching (CS), a ubiquitous phenomenon due to the ease of communic...

Please sign up or login with your details

Forgot password? Click here to reset