Claim Matching Beyond English to Scale Global Fact-Checking

06/01/2021
by   Ashkan Kazemi, et al.
4

Manual fact-checking does not scale well to serve the needs of the internet. This issue is further compounded in non-English contexts. In this paper, we discuss claim matching as a possible solution to scale fact-checking. We define claim matching as the task of identifying pairs of textual messages containing claims that can be served with one fact-check. We construct a novel dataset of WhatsApp tipline and public group messages alongside fact-checked claims that are first annotated for containing "claim-like statements" and then matched with potentially similar items and annotated for claim matching. Our dataset contains content in high-resource (English, Hindi) and lower-resource (Bengali, Malayalam, Tamil) languages. We train our own embedding model using knowledge distillation and a high-quality "teacher" model in order to address the imbalance in embedding quality between the low- and high-resource languages in our dataset. We provide evaluations on the performance of our solution and compare with baselines and existing state-of-the-art multilingual embedding models, namely LASER and LaBSE. We demonstrate that our performance exceeds LASER and LaBSE in all settings. We release our annotated datasets, codebooks, and trained embedding model to allow for further research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2021

The Case for Claim Difficulty Assessment in Automatic Fact Checking

Fact-checking is the process (human, automated, or hybrid) by which clai...
research
04/20/2018

ClaimRank: Detecting Check-Worthy Claims in Arabic and English

We present ClaimRank, an online system for detecting check-worthy claims...
research
02/14/2022

Matching Tweets With Applicable Fact-Checks Across Languages

An important challenge for news fact-checking is the effective dissemina...
research
03/24/2022

Generating Scientific Claims for Zero-Shot Scientific Fact Checking

Automated scientific fact checking is difficult due to the complexity of...
research
09/19/2021

UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying Multilingual Check-worthy Claims

Identifying check-worthy claims is often the first step of automated fac...
research
01/26/2022

CsFEVER and CTKFacts: Czech Datasets for Fact Verification

In this paper, we present two Czech datasets for automated fact-checking...
research
09/22/2021

Scalable Fact-checking with Human-in-the-Loop

Researchers have been investigating automated solutions for fact-checkin...

Please sign up or login with your details

Forgot password? Click here to reset