Universal Dependency Parsing for Hindi-English Code-switching

04/16/2018
by   Irshad Ahmad Bhat, et al.
0

Code-switching is a phenomenon of mixing grammatical structures of two or more languages under varied social constraints. The code-switching data differ so radically from the benchmark corpora used in NLP community that the application of standard technologies to these data degrades their performance sharply. Unlike standard corpora, these data often need to go through additional processes such as language identification, normalization and/or back-transliteration for their efficient processing. In this paper, we investigate these indispensable processes and other problems associated with syntactic parsing of code-switching data and propose methods to mitigate their effects. In particular, we study dependency parsing of code-switching data of Hindi and English multilingual speakers from Twitter. We present a treebank of Hindi-English code-switching tweets under Universal Dependencies scheme and propose a neural stacking model for parsing that efficiently leverages part-of-speech tag and syntactic tree annotations in the code-switching treebank and the preexisting Hindi and English treebanks. We also present normalization and back-transliteration models with a decoding process tailored for code-switching data. Results show that our neural stacking parser is 1.5 LAS points better than the augmented parsing model and our decoding process improves results by 3.8 back-transliteration.

READ FULL TEXT
research
05/18/2017

Universal Dependencies Parsing for Colloquial Singaporean English

Singlish can be interesting to the ACL community both linguistically as ...
research
05/31/2023

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Code-switching, also called code-mixing, is the linguistics phenomenon w...
research
03/31/2017

Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data

In this paper, we propose efficient and less resource-intensive strategi...
research
03/24/2017

Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching

Code-switching is the phenomenon by which bilingual speakers switch betw...
research
02/19/2022

CALCS 2021 Shared Task: Machine Translation for Code-Switched Data

To date, efforts in the code-switching literature have focused for the m...
research
05/06/2019

English-Bhojpuri SMT System: Insights from the Karaka Model

This thesis has been divided into six chapters namely: Introduction, Kar...
research
05/09/2020

LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation

Recent trends in NLP research have raised an interest in linguistic code...

Please sign up or login with your details

Forgot password? Click here to reset