Part of speech tagging for code switched data

09/28/2019
by   Fahad AlGhamdi, et al.
0

We address the problem of Part of Speech tagging (POS) in the context of linguistic code switching (CS). CS is the phenomenon where a speaker switches between two languages or variants of the same language within or across utterances, known as intra-sentential or inter-sentential CS, respectively. Processing CS data is especially challenging in intra-sentential data given state of the art monolingual NLP technology since such technology is geared toward the processing of one language at a time. In this paper we explore multiple strategies of applying state of the art POS taggers to CS data. We investigate the landscape in two CS language pairs, Spanish-English and Modern Standard Arabic-Arabic dialects. We compare the use of two POS taggers vs. a unified tagger trained on CS data. Our results show that applying a machine learning framework using two state of the art POS taggers achieves better performance compared to all other approaches that we investigate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2019

Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data

Linguistic Code Switching (CS) is a phenomenon that occurs when multilin...
research
04/03/2019

Subword-Level Language Identification for Intra-Word Code-Switching

Language identification for code-switching (CS), the phenomenon of alter...
research
09/24/2019

Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

Code-switching (CS) is a widespread phenomenon among bilingual and multi...
research
06/01/2020

Lexical Normalization for Code-switched Data and its Effect on POS-tagging

Social media provides an unfiltered stream of user-generated input, lead...
research
04/06/2023

A Context-Switching/Dual-Context ROM Augmented RAM using Standard 8T SRAM

The landscape of emerging applications has been continually widening, en...
research
05/09/2020

LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation

Recent trends in NLP research have raised an interest in linguistic code...
research
12/13/2021

Predicting User Code-Switching Level from Sociological and Psychological Profiles

Multilingual speakers tend to alternate between languages within a conve...

Please sign up or login with your details

Forgot password? Click here to reset