Multilingual Previously Fact-Checked Claim Retrieval

05/13/2023
by   Matúš Pikuliak, et al.
0

Fact-checkers are often hampered by the sheer amount of online content that needs to be fact-checked. NLP can help them by retrieving already existing fact-checks relevant to the content being investigated. This paper introduces a new multilingual dataset – MultiClaim – for previously fact-checked claim retrieval. We collected 28k posts in 27 languages from social media, 206k fact-checks in 39 languages written by professional fact-checkers, as well as 31k connections between these two groups. This is the most extensive and the most linguistically diverse dataset of this kind to date. We evaluated how different unsupervised methods fare on this dataset and its various dimensions. We show that evaluating such a diverse dataset has its complexities and proper care needs to be taken before interpreting the results. We also evaluated a supervised fine-tuning approach, improving upon the unsupervised method significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2022

Matching Tweets With Applicable Fact-Checks Across Languages

An important challenge for news fact-checking is the effective dissemina...
research
08/06/2023

Improving Domain-Specific Retrieval by NLI Fine-Tuning

The aim of this article is to investigate the fine-tuning potential of n...
research
09/10/2022

Harnessing Abstractive Summarization for Fact-Checked Claim Detection

Social media platforms have become new battlegrounds for anti-social ele...
research
09/27/2021

MFAQ: a Multilingual FAQ Dataset

In this paper, we present the first multilingual FAQ dataset publicly av...
research
12/16/2020

Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism

This article investigates multilingual evidence retrieval and fact verif...

Please sign up or login with your details

Forgot password? Click here to reset