Code-switched Language Models Using Dual RNNs and Same-Source Pretraining

09/06/2018
by   Saurabh Garg, et al.
0

This work focuses on building language models (LMs) for code-switched text. We propose two techniques that significantly improve these LMs: 1) A novel recurrent neural network unit with dual components that focus on each language in the code-switched text separately 2) Pretraining the LM using synthetic text from a generative model estimated using the training data. We demonstrate the effectiveness of our proposed techniques by reporting perplexities on a Mandarin-English task and derive significant reductions in perplexity.

READ FULL TEXT
research
11/03/2017

Dual Language Models for Code Mixed Speech Recognition

In this work, we present a new approach to language modeling for bilingu...
research
07/21/2021

The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding

While recent benchmarks have spurred a lot of new work on improving the ...
research
03/01/2023

R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents

Large language models show impressive results at predicting structured t...
research
07/14/2021

From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text

Generating code-switched text is a problem of growing interest, especial...
research
09/24/2020

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

Pretrained neural language models (LMs) are prone to generating racist, ...
research
08/16/2015

Online Representation Learning in Recurrent Neural Language Models

We investigate an extension of continuous online learning in recurrent n...
research
05/12/2021

Improving Code Autocompletion with Transfer Learning

Software language models have achieved promising results predicting code...

Please sign up or login with your details

Forgot password? Click here to reset