Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts

08/24/2021
by   Charangan Vasantharajan, et al.
0

Offensive Language detection in social media platforms has been an active field of research over the past years. In non-native English spoken countries, social media users mostly use a code-mixed form of text in their posts/comments. This poses several challenges in the offensive content identification tasks, and considering the low resources available for Tamil, the task becomes much harder. The current study presents extensive experiments using multiple deep learning, and transfer learning models to detect offensive content on YouTube. We propose a novel and flexible approach of selective translation and transliteration techniques to reap better results from fine-tuning and ensembling multilingual transformer networks like BERT, Distil- BERT, and XLM-RoBERTa. The experimental results showed that ULMFiT is the best model for this task. The best performing models were ULMFiT and mBERTBiLSTM for this Tamil code-mix dataset instead of more popular transfer learning models such as Distil- BERT and XLM-RoBERTa and hybrid deep learning models. The proposed model ULMFiT and mBERTBiLSTM yielded good results and are promising for effective offensive speech identification in low-resourced languages.

READ FULL TEXT
research
11/01/2020

WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments

This paper describes the WLV-RIT entry to the Hate Speech and Offensive ...
research
11/15/2021

IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages

Sentiment analysis of social media posts and comments for various market...
research
12/17/2020

Hate Speech detection in the Bengali language: A dataset and its baseline evaluation

Social media sites such as YouTube and Facebook have become an integral ...
research
09/23/2018

Mind Your Language: Abuse and Offense Detection for Code-Switched Languages

In multilingual societies like the Indian subcontinent, use of code-swit...
research
10/18/2021

Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches

In the recent past, social media platforms have helped people in connect...
research
03/22/2023

Interpretable Bangla Sarcasm Detection using BERT and Explainable AI

A positive phrase or a sentence with an underlying negative motive is us...

Please sign up or login with your details

Forgot password? Click here to reset