Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation

11/06/2018
by   Ching-Ting Chang, et al.
0

Code-switching is about dealing with alternative languages in speech or text. It is partially speaker-depend and domain-related, so completely explaining the phenomenon by linguistic rules is challenging. Compared to monolingual tasks, insufficient data is an issue for code-switching. To mitigate the issue without expensive human annotation, we proposed an unsupervised method for code-switching data augmentation. By utilizing a generative adversarial network, we can generate intra-sentential code-switching sentences from monolingual sentences. We applied proposed method on two corpora, and the result shows that the generated code-switching sentences improve the performance of code-switching language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2018

Learn to Code-Switch: Data Augmentation using Copy Mechanism on Language Modeling

Building large-scale datasets for training code-switching language model...
research
06/21/2019

A Deep Generative Model for Code-Switched Text

Code-switching, the interleaving of two or more languages within a sente...
research
07/30/2015

One model, two languages: training bilingual parsers with harmonized treebanks

We introduce an approach to train lexicalized parsers using bilingual co...
research
07/14/2021

From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text

Generating code-switched text is a problem of growing interest, especial...
research
12/12/2021

Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

This paper presents our latest effort on improving Code-switching langua...
research
10/20/2022

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

Current end-to-end code-switching Text-to-Speech (TTS) can already gener...
research
09/18/2019

Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences

Training code-switched language models is difficult due to lack of data ...

Please sign up or login with your details

Forgot password? Click here to reset