The Surprising Performance of Simple Baselines for Misinformation Detection

04/14/2021
by   Kellin Pelrine, et al.
0

As social media becomes increasingly prominent in our day to day lives, it is increasingly important to detect informative content and prevent the spread of disinformation and unverified rumours. While many sophisticated and successful models have been proposed in the literature, they are often compared with older NLP baselines such as SVMs, CNNs, and LSTMs. In this paper, we examine the performance of a broad set of modern transformer-based language models and show that with basic fine-tuning, these models are competitive with and can even significantly outperform recently proposed state-of-the-art methods. We present our framework as a baseline for creating and evaluating new methods for misinformation detection. We further study a comprehensive set of benchmark datasets, and discuss potential data leakage and the need for careful design of the experiments and understanding of datasets to account for confounding variables. As an extreme case example, we show that classifying only based on the first three digits of tweet ids, which contain information on the date, gives state-of-the-art performance on a commonly used benchmark dataset for fake news detection –Twitter16. We provide a simple tool to detect this problem and suggest steps to mitigate it in future datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

Evaluation of Fake News Detection with Knowledge-Enhanced Language Models

Recent advances in fake news detection have exploited the success of lar...
research
06/16/2023

Clickbait Detection via Large Language Models

Clickbait, which aims to induce users with some surprising and even thri...
research
04/04/2022

Applying Automatic Text Summarization for Fake News Detection

The distribution of fake news is not a new but a rapidly growing problem...
research
06/15/2022

ETMA: Efficient Transformer Based Multilevel Attention framework for Multimodal Fake News Detection

In this new digital era, social media has created a severe impact on the...
research
05/29/2023

Improving Generalization for Multimodal Fake News Detection

The increasing proliferation of misinformation and its alarming impact h...
research
12/01/2022

Duplicate Bug Report Detection: How Far Are We?

Many Duplicate Bug Report Detection (DBRD) techniques have been proposed...

Please sign up or login with your details

Forgot password? Click here to reset