Part-of-Speech Tagging for Code-mixed Indian Social Media Text at ICON 2015

01/06/2016
by   Kamal Sarkar, et al.
0

This paper discusses the experiments carried out by us at Jadavpur University as part of the participation in ICON 2015 task: POS Tagging for Code-mixed Indian Social Media Text. The tool that we have developed for the task is based on Trigram Hidden Markov Model that utilizes information from dictionary as well as some other word level features to enhance the observation probabilities of the known tokens as well as unknown tokens. We submitted runs for Bengali-English, Hindi-English and Tamil-English Language pairs. Our system has been trained and tested on the datasets released for ICON 2015 shared task: POS Tagging For Code-mixed Indian Social Media Text. In constrained mode, our system obtains average overall accuracy (averaged over all three language pairs) of 75.60 for IIITH and 75.79 unconstrained mode, our system obtains average overall accuracy of 70.65 is also close to the system (72.85 average overall accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2015

A Hidden Markov Model Based System for Entity Extraction from Social Media English Text at FIRE 2015

This paper presents the experiments carried out by us at Jadavpur Univer...
research
12/23/2016

A CRF Based POS Tagger for Code-mixed Indian Social Media Text

In this work, we describe a conditional random fields (CRF) based system...
research
10/31/2016

Experiments with POS Tagging Code-mixed Indian Social Media Text

This paper presents Centre for Development of Advanced Computing Mumbai'...
research
04/11/2016

Shallow Parsing Pipeline for Hindi-English Code-Mixed Social Media Text

In this study, the problem of shallow parsing of Hindi-English code-mixe...
research
12/31/2016

A POS Tagger for Code Mixed Indian Social Media Text - ICON-2016 NLP Tools Contest Entry from Surukam

Building Part-of-Speech (POS) taggers for code-mixed Indian languages is...
research
07/29/2020

Development of POS tagger for English-Bengali Code-Mixed data

Code-mixed texts are widespread nowadays due to the advent of social med...
research
04/19/2018

Stylistic Variation in Social Media Part-of-Speech Tagging

Social media features substantial stylistic variation, raising new chall...

Please sign up or login with your details

Forgot password? Click here to reset