PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

11/12/2021
by   Mohsin Ali, et al.
4

NLP applications for code-mixed (CM) or mix-lingual text have gained a significant momentum recently, the main reason being the prevalence of language mixing in social media communications in multi-lingual societies like India, Mexico, Europe, parts of USA etc. Word embeddings are basic build-ing blocks of any NLP system today, yet, word embedding for CM languages is an unexplored territory. The major bottleneck for CM word embeddings is switching points, where the language switches. These locations lack in contextually and statistical systems fail to model this phenomena due to high variance in the seen examples. In this paper we present our initial observations on applying switching point based positional encoding techniques for CM language, specifically Hinglish (Hindi - English). Results are only marginally better than SOTA, but it is evident that positional encoding could bean effective way to train position sensitive language models for CM text.

READ FULL TEXT

page 1

page 2

research
02/22/2021

Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

Urdu is a widely spoken language in South Asia. Though immoderate litera...
research
09/24/2019

Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

Code-switching (CS) is a widespread phenomenon among bilingual and multi...
research
11/23/2020

Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text

Natural language processing (NLP) techniques have become mainstream in t...
research
03/15/2017

Is this word borrowed? An automatic approach to quantify the likeliness of borrowing in social media

Code-mixing or code-switching are the effortless phenomena of natural sw...
research
10/04/2017

Syntactic and Semantic Features For Code-Switching Factored Language Models

This paper presents our latest investigations on different features for ...
research
06/11/2018

Automatic Target Recovery for Hindi-English Code Mixed Puns

In order for our computer systems to be more human-like, with a higher e...
research
02/06/2022

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection

The presence of sarcasm in conversational systems and social media like ...

Please sign up or login with your details

Forgot password? Click here to reset