SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

02/01/2017
by   Deepak Gupta, et al.
0

Use of social media has grown dramatically during the last few years. Users follow informal languages in communicating through social media. The language of communication is often mixed in nature, where people transcribe their regional language with English and this technique is found to be extremely popular. Natural language processing (NLP) aims to infer the information from these text where Part-of-Speech (PoS) tagging plays an important role in getting the prosody of the written text. For the task of PoS tagging on Code-Mixed Indian Social Media Text, we develop a supervised system based on Conditional Random Field classifier. In order to tackle the problem effectively, we have focused on extracting rich linguistic features. We participate in three different language pairs, ie. English-Hindi, English-Bengali and English-Telugu on three different social media platforms, Twitter, Facebook & WhatsApp. The proposed system is able to successfully assign coarse as well as fine-grained PoS tag labels for a given a code-mixed sentence. Experiments show that our system is quite generic that shows encouraging performance levels on all the three language pairs in all the domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2016

Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text

This paper describes Centre for Development of Advanced Computing's (CDA...
research
10/13/2021

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

Although conceptualization has been widely studied in semantics and know...
research
12/30/2019

"Hinglish" Language – Modeling a Messy Code-Mixed Language

With a sharp rise in fluency and users of "Hinglish" in linguistically d...
research
12/31/2016

A POS Tagger for Code Mixed Indian Social Media Text - ICON-2016 NLP Tools Contest Entry from Surukam

Building Part-of-Speech (POS) taggers for code-mixed Indian languages is...
research
07/29/2020

Development of POS tagger for English-Bengali Code-Mixed data

Code-mixed texts are widespread nowadays due to the advent of social med...
research
04/11/2016

Shallow Parsing Pipeline for Hindi-English Code-Mixed Social Media Text

In this study, the problem of shallow parsing of Hindi-English code-mixe...
research
01/15/2020

A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts

Wide usage of social media platforms has increased the risk of aggressio...

Please sign up or login with your details

Forgot password? Click here to reset