Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision

09/14/2023
by   João A. Leite, et al.
0

Credibility signals represent a wide range of heuristics that are typically used by journalists and fact-checkers to assess the veracity of online content. Automating the task of credibility signal extraction, however, is very challenging as it requires high-accuracy signal-specific extractors to be trained, while there are currently no sufficiently large datasets annotated with all credibility signals. This paper investigates whether large language models (LLMs) can be prompted effectively with a set of 18 credibility signals to produce weak labels for each signal. We then aggregate these potentially noisy labels using weak supervision in order to predict content veracity. We demonstrate that our approach, which combines zero-shot LLM credibility signal labeling and weak supervision, outperforms state-of-the-art classifiers on two misinformation datasets without using any ground-truth labels for training. We also analyse the contribution of the individual credibility signals towards predicting content veracity, which provides new valuable insights into their role in misinformation detection.

READ FULL TEXT

page 14

page 20

page 21

research
02/08/2022

Data Consistency for Weakly Supervised Learning

In many applications, training machine learning models involves using la...
research
05/04/2022

Language Models in the Loop: Incorporating Prompting into Weak Supervision

We propose a new strategy for applying large pre-trained language models...
research
04/28/2023

HQP: A Human-Annotated Dataset for Detecting Online Propaganda

Online propaganda poses a severe threat to the integrity of societies. H...
research
04/29/2022

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

The online sharing and viewing of Child Sexual Abuse Material (CSAM) are...
research
12/08/2020

Concept Drift and Covariate Shift Detection Ensemble with Lagged Labels

In model serving, having one fixed model during the entire often life-lo...
research
08/30/2022

AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels

Weak supervision (WS) is a powerful method to build labeled datasets for...
research
04/05/2019

Spatial CUSUM for Signal Region Detection

Detecting weak clustered signal in spatial data is important but challen...

Please sign up or login with your details

Forgot password? Click here to reset