Training a code-switching language model with monolingual data

11/14/2019
by   Shun-Po Chuang, et al.
0

A lack of code-switching data complicates the training of code-switching (CS) language models. We propose an approach to train such CS language models on monolingual data only. By constraining and normalizing the output projection matrix in RNN-based language models, we bring embeddings of different languages closer to each other. Numerical and visualization results show that the proposed approaches remarkably improve the performance of CS language models trained on monolingual data. The proposed approaches are comparable or even better than training CS language models with artificially generated CS data. We additionally use unsupervised bilingual word translation to analyze whether semantically equivalent words in different languages are mapped together.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2020

Code Switching Language Model Using Monolingual Training Data

Training a code-switching (CS) language model using only monolingual dat...
research
01/19/2022

Evaluating Machine Common Sense via Cloze Testing

Language models (LMs) show state of the art performance for common sense...
research
06/10/2021

KARI: KAnari/QCRI's End-to-End systems for the INTERSPEECH 2021 Indian Languages Code-Switching Challenge

In this paper, we present the Kanari/QCRI (KARI) system and the modeling...
research
06/01/2020

Lexical Normalization for Code-switched Data and its Effect on POS-tagging

Social media provides an unfiltered stream of user-generated input, lead...
research
06/21/2019

A Deep Generative Model for Code-Switched Text

Code-switching, the interleaving of two or more languages within a sente...
research
04/03/2019

Subword-Level Language Identification for Intra-Word Code-Switching

Language identification for code-switching (CS), the phenomenon of alter...
research
07/30/2015

One model, two languages: training bilingual parsers with harmonized treebanks

We introduce an approach to train lexicalized parsers using bilingual co...

Please sign up or login with your details

Forgot password? Click here to reset