Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data

05/31/2019
by   Fahad AlGhamdi, et al.
0

Linguistic Code Switching (CS) is a phenomenon that occurs when multilingual speakers alternate between two or more languages/dialects within a single conversation. Processing CS data is especially challenging in intra-sentential data given state-of-the-art monolingual NLP technologies since such technologies are geared toward the processing of one language at a time. In this paper, we address the problem of Part-of-Speech tagging (POS) in the context of linguistic code switching (CS). We explore leveraging multiple neural network architectures to measure the impact of different pre-trained embeddings methods on POS tagging CS data. We investigate the landscape in four CS language pairs, Spanish-English, Hindi-English, Modern Standard Arabic- Egyptian Arabic dialect (MSA-EGY), and Modern Standard Arabic- Levantine Arabic dialect (MSA-LEV). Our results show that multilingual embedding (e.g., MSA-EGY and MSA-LEV) helps closely related languages (EGY/LEV) but adds noise to the languages that are distant (SPA/HIN). Finally, we show that our proposed models outperform state-of-the-art CS taggers for MSA-EGY language pair.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2019

Part of speech tagging for code switched data

We address the problem of Part of Speech tagging (POS) in the context of...
research
09/24/2019

Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

Code-switching (CS) is a widespread phenomenon among bilingual and multi...
research
05/09/2020

LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation

Recent trends in NLP research have raised an interest in linguistic code...
research
06/01/2020

Lexical Normalization for Code-switched Data and its Effect on POS-tagging

Social media provides an unfiltered stream of user-generated input, lead...
research
11/01/2021

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Code-switching (CS), a ubiquitous phenomenon due to the ease of communic...
research
12/13/2021

Predicting User Code-Switching Level from Sociological and Psychological Profiles

Multilingual speakers tend to alternate between languages within a conve...
research
07/31/2022

The Who in Code-Switching: A Case Study for Predicting Egyptian Arabic-English Code-Switching Levels based on Character Profiles

Code-switching (CS) is a common linguistic phenomenon exhibited by multi...

Please sign up or login with your details

Forgot password? Click here to reset