BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

01/22/2021
by   Fouzi Harrag, et al.
0

During the last two decades, we have progressively turned to the Internet and social media to find news, entertain conversations and share opinion. Recently, OpenAI has developed a ma-chine learning system called GPT-2 for Generative Pre-trained Transformer-2, which can pro-duce deepfake texts. It can generate blocks of text based on brief writing prompts that look like they were written by humans, facilitating the spread false or auto-generated text. In line with this progress, and in order to counteract potential dangers, several methods have been pro-posed for detecting text written by these language models. In this paper, we propose a transfer learning based model that will be able to detect if an Arabic sentence is written by humans or automatically generated by bots. Our dataset is based on tweets from a previous work, which we have crawled and extended using the Twitter API. We used GPT2-Small-Arabic to generate fake Arabic Sentences. For evaluation, we compared different recurrent neural network (RNN) word embeddings based baseline models, namely: LSTM, BI-LSTM, GRU and BI-GRU, with a transformer-based model. Our new transfer-learning model has obtained an accuracy up to 98 knowledge, this work is the first study where ARABERT and GPT2 were combined to detect and classify the Arabic auto-generated texts.

READ FULL TEXT
research
04/20/2022

Towards Arabic Sentence Simplification via Classification and Generative Approaches

This paper presents an attempt to build a Modern Standard Arabic (MSA) s...
research
07/31/2020

TweepFake: about Detecting Deepfake Tweets

The threat of deepfakes, synthetic, or manipulated media, is becoming in...
research
11/18/2021

Supporting Undotted Arabic with Pre-trained Language Models

We observe a recent behaviour on social media, in which users intentiona...
research
04/23/2020

Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks

Many of the great Jewish works of the Middle Ages were written in Judeo-...
research
11/05/2020

Machine Generation and Detection of Arabic Manipulated and Fake News

Fake news and deceptive machine-generated text are serious problems thre...
research
05/15/2020

KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT

This research presents our team KEIS@JUST participation at SemEval-2020 ...

Please sign up or login with your details

Forgot password? Click here to reset