Shallow Parsing Pipeline for Hindi-English Code-Mixed Social Media Text

04/11/2016
by   Arnav Sharma, et al.
0

In this study, the problem of shallow parsing of Hindi-English code-mixed social media text (CSMT) has been addressed. We have annotated the data, developed a language identifier, a normalizer, a part-of-speech tagger and a shallow parser. To the best of our knowledge, we are the first to attempt shallow parsing on CSMT. The pipeline developed has been made available to the research community with the goal of enabling better text analysis of Hindi English CSMT. The pipeline is accessible at http://bit.ly/csmt-parser-api .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2016

Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text

This paper describes Centre for Development of Advanced Computing's (CDA...
research
02/01/2017

SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

Use of social media has grown dramatically during the last few years. Us...
research
01/06/2016

Part-of-Speech Tagging for Code-mixed Indian Social Media Text at ICON 2015

This paper discusses the experiments carried out by us at Jadavpur Unive...
research
10/09/2020

Word Level Language Identification in English Telugu Code Mixed Data

In a multilingual or sociolingual configuration Intra-sentential Code Sw...
research
08/31/2016

Demographic Dialectal Variation in Social Media: A Case Study of African-American English

Though dialectal language is increasingly abundant on social media, few ...
research
05/22/2018

Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance

Building tools for code-mixed data is rapidly gaining popularity in the ...
research
10/11/2016

Keystroke dynamics as signal for shallow syntactic parsing

Keystroke dynamics have been extensively used in psycholinguistic and wr...

Please sign up or login with your details

Forgot password? Click here to reset