Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models

07/13/2023
by   Arman Sakif Chowdhury, et al.
0

With the rise of social media and online news sources, fake news has become a significant issue globally. However, the detection of fake news in low resource languages like Bengali has received limited attention in research. In this paper, we propose a methodology consisting of four distinct approaches to classify fake news articles in Bengali using summarization and augmentation techniques with five pre-trained language models. Our approach includes translating English news articles and using augmentation techniques to curb the deficit of fake news articles. Our research also focused on summarizing the news to tackle the token length limitation of BERT based models. Through extensive experimentation and rigorous evaluation, we show the effectiveness of summarization and augmentation in the case of Bengali fake news detection. We evaluated our models using three separate test datasets. The BanglaBERT Base model, when combined with augmentation techniques, achieved an impressive accuracy of 96 BanglaBERT model, trained with summarized augmented news articles achieved 97 accuracy. Lastly, the mBERT Base model achieved an accuracy of 86 test dataset which was reserved for generalization performance evaluation. The datasets and implementations are available at https://github.com/arman-sakif/Bengali-Fake-News-Detection

READ FULL TEXT

page 14

page 25

research
10/11/2020

Connecting the Dots Between Fact Verification and Fake News Detection

Fact verification models have enjoyed a fast advancement in the last two...
research
08/26/2019

Detecting Toxicity in News Articles: Application to Bulgarian

Online media aim for reaching ever bigger audience and for attracting ev...
research
10/21/2019

Localization of Fake News Detection via Multitask Transfer Learning

The use of the internet as a fast medium of spreading fake news reinforc...
research
10/27/2019

Do Sentence Interactions Matter? Leveraging Sentence Level Representations for Fake News Classification

The rising growth of fake news and misleading information through online...
research
07/22/2021

DeepTitle – Leveraging BERT to generate Search Engine Optimized Headlines

Automated headline generation for online news articles is not a trivial ...
research
03/24/2019

Neural Abstractive Text Summarization and Fake News Detection

In this work, we study abstractive text summarization by exploring diffe...
research
09/16/2023

RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification

In this study, we present a novel and challenging multilabel Vietnamese ...

Please sign up or login with your details

Forgot password? Click here to reset