Fine-grained Czech News Article Dataset: An Interdisciplinary Approach to Trustworthiness Analysis

by   Matyáš Boháček, et al.

We present the Verifee Dataset: a novel dataset of news articles with fine-grained trustworthiness annotations. We develop a detailed methodology that assesses the texts based on their parameters encompassing editorial transparency, journalist conventions, and objective reporting while penalizing manipulative techniques. We bring aboard a diverse set of researchers from social, media, and computer sciences to overcome barriers and limited framing of this interdisciplinary problem. We collect over 10,000 unique articles from almost 60 Czech online news sources. These are categorized into one of the 4 classes across the credibility spectrum we propose, raging from entirely trustworthy articles all the way to the manipulative ones. We produce detailed statistics and study trends emerging throughout the set. Lastly, we fine-tune multiple popular sequence-to-sequence language models using our dataset on the trustworthiness classification task and report the best testing F-1 score of 0.52. We open-source the dataset, annotation methodology, and annotators' instructions in full length at to enable easy build-up work. We believe similar methods can help prevent disinformation and educate in the realm of media literacy.


page 1

page 2

page 3

page 4


Fine-Grained Analysis of Propaganda in News Articles

Propaganda aims at influencing people's mindset with the purpose of adva...

NELA-GT-2018: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles

In this paper, we present a dataset of 713k articles collected between 0...

NELA-GT-2020: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles

In this paper, we present an updated version of the NELA-GT-2019 dataset...

The POLUSA Dataset: 0.9M Political News Articles Balanced by Time and Outlet Popularity

News articles covering policy issues are an essential source of informat...

A New Task and Dataset on Detecting Attacks on Human Rights Defenders

The ability to conduct retrospective analyses of attacks on human rights...

A Fine-grained Sentiment Dataset for Norwegian

We here introduce NoReCfine, a dataset for fine-grained sentiment analys...

An unsupervised framework for tracing textual sources of moral change

Morality plays an important role in social well-being, but people's mora...

Please sign up or login with your details

Forgot password? Click here to reset