A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking

10/29/2019
by   Andreas Hanselowski, et al.
0

Automated fact-checking based on machine learning is a promising approach to identify false information distributed on the web. In order to achieve satisfactory performance, machine learning methods require a large corpus with reliable annotations for the different tasks in the fact-checking process. Having analyzed existing fact-checking corpora, we found that none of them meets these criteria in full. They are either too small in size, do not provide detailed annotations, or are limited to a single domain. Motivated by this gap, we present a new substantially sized mixed-domain corpus with annotations of good quality for the core fact-checking tasks: document retrieval, evidence extraction, stance detection, and claim validation. To aid future corpus construction, we describe our methodology for corpus creation and annotation, and demonstrate that it results in substantial inter-annotator agreement. As baselines for future research, we perform experiments on our corpus with a number of model architectures that reach high performance in similar problem settings. Finally, to support the development of future models, we provide a detailed error analysis for each of the tasks. Our results show that the realistic, multi-domain setting defined by our data poses new challenges for the existing models, providing opportunities for considerable improvement by future systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2018

Integrating Stance Detection and Fact Checking in a Unified Corpus

A reasonable approach for fact checking a claim involves retrieving pote...
research
06/21/2021

Towards a corpus for credibility assessment in software practitioner blog articles

Blogs are a source of grey literature which are widely adopted by softwa...
research
01/26/2022

CsFEVER and CTKFacts: Czech Datasets for Fact Verification

In this paper, we present two Czech datasets for automated fact-checking...
research
04/26/2022

CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets

Over the course of the COVID-19 pandemic, large volumes of biomedical in...
research
07/25/2022

Graph Querying for Semantic Annotations

This paper presents how the online tool GREW-MATCH can be used to make q...
research
01/27/2023

Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking

Evidence data for automated fact-checking (AFC) can be in multiple modal...
research
09/07/2022

Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems

Mis- and disinformation are now a substantial global threat to our secur...

Please sign up or login with your details

Forgot password? Click here to reset