Clickbait Headline Detection in Indonesian News Sites using Multilingual Bidirectional Encoder Representations from Transformers (M-BERT)

02/02/2021
by   Muhammad N. Fakhruzzaman, et al.
0

Click counts are related to the amount of money that online advertisers paid to news sites. Such business models forced some news sites to employ a dirty trick of click-baiting, i.e., using a hyperbolic and interesting words, sometimes unfinished sentence in a headline to purposefully tease the readers. Some Indonesian online news sites also joined the party of clickbait, which indirectly degrade other established news sites' credibility. A neural network with a pre-trained language model M-BERT that acted as a embedding layer is then combined with a 100 nodes hidden layer and topped with a sigmoid classifier was trained to detect clickbait headlines. With a total of 6632 headlines as a training dataset, the classifier performed remarkably well. Evaluated with 5-fold cross validation, it has an accuracy score of 0.914, f1-score of 0.914, precision score of 0.916, and ROC-AUC of 0.92. The usage of multilingual BERT in Indonesian text classification task was tested and is possible to be enhanced further. Future possibilities, societal impact, and limitations of the clickbait detection are discussed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2019

Enhancing PIO Element Detection in Medical Text Using Contextualized Embedding

In this paper, we investigate a new approach to Population, Intervention...
research
01/07/2021

Exploring Text-transformers in AAAI 2021 Shared Task: COVID-19 Fake News Detection in English

In this paper, we describe our system for the AAAI 2021 shared task of C...
research
04/10/2019

Harvey Mudd College at SemEval-2019 Task 4: The Clint Buchanan Hyperpartisan News Detector

We investigate the recently developed Bidirectional Encoder Representati...
research
11/30/2020

Fake News Detection in Social Media using Graph Neural Networks and NLP Techniques: A COVID-19 Use-case

The paper presents our solutions for the MediaEval 2020 task namely Fake...
research
12/05/2022

Human-in-the-Loop Hate Speech Classification in a Multilingual Context

The shift of public debate to the digital sphere has been accompanied by...
research
02/21/2021

Web-based Application for Detecting Indonesian Clickbait Headlines using IndoBERT

With increasing usage of clickbaits in Indonesian Online News, newsworth...

Please sign up or login with your details

Forgot password? Click here to reset