A Deep Generative Model for Code-Switched Text

06/21/2019
by   Bidisha Samanta, et al.
0

Code-switching, the interleaving of two or more languages within a sentence or discourse is pervasive in multilingual societies. Accurate language models for code-switched text are critical for NLP tasks. State-of-the-art data-intensive neural language models are difficult to train well from scarce language-labeled code-switched text. A potential solution is to use deep generative models to synthesize large volumes of realistic code-switched text. Although generative adversarial networks and variational autoencoders can synthesize plausible monolingual text from continuous latent space, they cannot adequately address code-switched text, owing to their informal style and complex interplay between the constituent languages. We introduce VACS, a novel variational autoencoder architecture specifically tailored to code-switching phenomena. VACS encodes to and decodes from a two-level hierarchical representation, which models syntactic contextual signals in the lower level, and language switching signals in the upper layer. Sampling representations from the prior and decoding them produced well-formed, diverse code-switched sentences. Extensive experiments show that using synthetic code-switched text with natural monolingual data results in significant (33.06 perplexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2018

Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation

Code-switching is about dealing with alternative languages in speech or ...
research
06/13/2019

Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

Multilingual writers and speakers often alternate between two languages ...
research
12/12/2021

Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

This paper presents our latest effort on improving Code-switching langua...
research
11/14/2019

Training a code-switching language model with monolingual data

A lack of code-switching data complicates the training of code-switching...
research
07/14/2021

From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text

Generating code-switched text is a problem of growing interest, especial...
research
09/29/2021

Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models

Practical needs of developing task-oriented dialogue assistants require ...
research
04/08/2020

Generating Narrative Text in a Switching Dynamical System

Early work on narrative modeling used explicit plans and goals to genera...

Please sign up or login with your details

Forgot password? Click here to reset