All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media

07/25/2017
by   Jasabanta Patro, et al.
0

In this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman correlation coefficient values, our methods perform more than two times better (nearly 0.62) in predicting the borrowing likeliness compared to the best performing baseline (nearly 0.26) reported in literature. Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts. In 88 percent of cases the annotators felt that the foreign language tag should be replaced by native language tag, thus indicating a huge scope for improvement of automatic language identification systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2017

Is this word borrowed? An automatic approach to quantify the likeliness of borrowing in social media

Code-mixing or code-switching are the effortless phenomena of natural sw...
research
07/27/2023

Turkish Native Language Identification

In this paper, we present the first application of Native Language Ident...
research
01/16/2021

Tuiteamos o pongamos un tuit? Investigating the Social Constraints of Loanword Integration in Spanish Social Media

Speakers of non-English languages often adopt loanwords from English to ...
research
08/10/2016

Hierarchical Character-Word Models for Language Identification

Social media messages' brevity and unconventional spelling pose a challe...
research
10/13/2021

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

Although conceptualization has been widely studied in semantics and know...
research
12/12/2015

A Hidden Markov Model Based System for Entity Extraction from Social Media English Text at FIRE 2015

This paper presents the experiments carried out by us at Jadavpur Univer...
research
03/14/2019

OffensEval at SemEval-2018 Task 6: Identifying and Categorizing Offensive Language in Social Media

This document describes our approach to building an Offensive Language C...

Please sign up or login with your details

Forgot password? Click here to reset