Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

09/24/2019
by   Injy Hamed, et al.
0

Code-switching (CS) is a widespread phenomenon among bilingual and multilingual societies. The lack of CS resources hinders the performance of many NLP tasks. In this work, we explore the potential use of bilingual word embeddings for code-switching (CS) language modeling (LM) in the low resource Egyptian Arabic-English language. We evaluate different state-of-the-art bilingual word embeddings approaches that require cross-lingual resources at different levels and propose an innovative but simple approach that jointly learns bilingual word representations without the use of any parallel data, relying only on monolingual and a small amount of CS data. While all representations improve CS LM, ours performs the best and improves perplexity 33.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2019

Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data

Linguistic Code Switching (CS) is a phenomenon that occurs when multilin...
research
09/28/2019

Part of speech tagging for code switched data

We address the problem of Part of Speech tagging (POS) in the context of...
research
05/25/2022

Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation

Code-switching (CS) poses several challenges to NLP tasks, where data sp...
research
05/01/2020

Style Variation as a Vantage Point for Code-Switching

Code-Switching (CS) is a common phenomenon observed in several bilingual...
research
11/12/2021

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

NLP applications for code-mixed (CM) or mix-lingual text have gained a s...
research
07/31/2022

The Who in Code-Switching: A Case Study for Predicting Egyptian Arabic-English Code-Switching Levels based on Character Profiles

Code-switching (CS) is a common linguistic phenomenon exhibited by multi...
research
10/04/2017

Cross-Language Question Re-Ranking

We study how to find relevant questions in community forums when the lan...

Please sign up or login with your details

Forgot password? Click here to reset