Fine-grained Czech News Article Dataset: An Interdisciplinary Approach to Trustworthiness Analysis

12/16/2022
by   Matyáš Boháček, et al.
0

We present the Verifee Dataset: a novel dataset of news articles with fine-grained trustworthiness annotations. We develop a detailed methodology that assesses the texts based on their parameters encompassing editorial transparency, journalist conventions, and objective reporting while penalizing manipulative techniques. We bring aboard a diverse set of researchers from social, media, and computer sciences to overcome barriers and limited framing of this interdisciplinary problem. We collect over 10,000 unique articles from almost 60 Czech online news sources. These are categorized into one of the 4 classes across the credibility spectrum we propose, raging from entirely trustworthy articles all the way to the manipulative ones. We produce detailed statistics and study trends emerging throughout the set. Lastly, we fine-tune multiple popular sequence-to-sequence language models using our dataset on the trustworthiness classification task and report the best testing F-1 score of 0.52. We open-source the dataset, annotation methodology, and annotators' instructions in full length at https://verifee.ai/research to enable easy build-up work. We believe similar methods can help prevent disinformation and educate in the realm of media literacy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2019

Fine-Grained Analysis of Propaganda in News Articles

Propaganda aims at influencing people's mindset with the purpose of adva...
research
04/02/2019

NELA-GT-2018: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles

In this paper, we present a dataset of 713k articles collected between 0...
research
02/08/2021

NELA-GT-2020: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles

In this paper, we present an updated version of the NELA-GT-2019 dataset...
research
05/27/2020

The POLUSA Dataset: 0.9M Political News Articles Balanced by Time and Outlet Popularity

News articles covering policy issues are an essential source of informat...
research
06/30/2023

A New Task and Dataset on Detecting Attacks on Human Rights Defenders

The ability to conduct retrospective analyses of attacks on human rights...
research
11/28/2019

A Fine-grained Sentiment Dataset for Norwegian

We here introduce NoReCfine, a dataset for fine-grained sentiment analys...
research
09/01/2021

An unsupervised framework for tracing textual sources of moral change

Morality plays an important role in social well-being, but people's mora...

Please sign up or login with your details

Forgot password? Click here to reset