RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification

09/16/2023
by   Hai Long Nguyen, et al.
0

In this study, we present a novel and challenging multilabel Vietnamese dataset (RMDM) designed to assess the performance of large language models (LLMs), in verifying electronic information related to legal contexts, focusing on fake news as potential input for electronic evidence. The RMDM dataset comprises four labels: real, mis, dis, and mal, representing real information, misinformation, disinformation, and mal-information, respectively. By including these diverse labels, RMDM captures the complexities of differing fake news categories and offers insights into the abilities of different language models to handle various types of information that could be part of electronic evidence. The dataset consists of a total of 1,556 samples, with 389 samples for each label. Preliminary tests on the dataset using GPT-based and BERT-based models reveal variations in the models' performance across different labels, indicating that the dataset effectively challenges the ability of various language models to verify the authenticity of such information. Our findings suggest that verifying electronic information related to legal contexts, including fake news, remains a difficult problem for language models, warranting further attention from the research community to advance toward more reliable AI models for potential legal applications.

READ FULL TEXT

page 1

page 5

research
04/02/2023

Classifying COVID-19 Related Tweets for Fake News Detection and Sentiment Analysis with BERT-based Models

The present paper is about the participation of our team "techno" on CER...
research
10/11/2020

Connecting the Dots Between Fact Verification and Fake News Detection

Fact verification models have enjoyed a fast advancement in the last two...
research
07/13/2023

Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models

With the rise of social media and online news sources, fake news has bec...
research
04/19/2020

BanFakeNews: A Dataset for Detecting Fake News in Bangla

Observing the damages that can be done by the rapid propagation of fake ...
research
06/09/2023

Implementing BERT and fine-tuned RobertA to detect AI generated news by ChatGPT

The abundance of information on social media has increased the necessity...
research
05/01/2022

The use of Data Augmentation as a technique for improving neural network accuracy in detecting fake news about COVID-19

This paper aims to present how the application of Natural Language Proce...
research
07/28/2023

Med-HALT: Medical Domain Hallucination Test for Large Language Models

This research paper focuses on the challenges posed by hallucinations in...

Please sign up or login with your details

Forgot password? Click here to reset