"Hinglish" Language – Modeling a Messy Code-Mixed Language

12/30/2019
by   Vivek Kumar Gupta, et al.
0

With a sharp rise in fluency and users of "Hinglish" in linguistically diverse country, India, it has increasingly become important to analyze social content written in this language in platforms such as Twitter, Reddit, Facebook. This project focuses on using deep learning techniques to tackle a classification problem in categorizing social content written in Hindi-English into Abusive, Hate-Inducing and Not offensive categories. We utilize bi-directional sequence models with easy text augmentation techniques such as synonym replacement, random insertion, random swap, and random deletion to produce a state of the art classifier that outperforms the previous work done on analyzing this dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2017

SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

Use of social media has grown dramatically during the last few years. Us...
research
03/06/2022

Enhanced Sentiment Extraction Architecture for Social Media Content Analysis Using Capsule Networks

Recent research has produced efficient algorithms based on deep learning...
research
03/31/2021

Misinformation detection in Luganda-English code-mixed social media text

The increasing occurrence, forms, and negative effects of misinformation...
research
02/28/2021

NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers

The increasing accessibility of the internet facilitated social media us...
research
01/15/2020

A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts

Wide usage of social media platforms has increased the risk of aggressio...
research
12/13/2021

Designing weighted and multiplex networks for deep learning user geolocation in Twitter

Predicting the geographical location of users of social media like Twitt...
research
04/22/2020

Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi

Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one e...

Please sign up or login with your details

Forgot password? Click here to reset