Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape

03/27/2018
by   Benjamin D. Horne, et al.
0

The complexity and diversity of today's media landscape provides many challenges for researchers studying news producers. These producers use many different strategies to get their message believed by readers through the writing styles they employ, by repetition across different media sources with or without attribution, as well as other mechanisms that are yet to be studied deeply. To better facilitate systematic studies in this area, we present a large political news data set, containing over 136K news articles, from 92 news sources, collected over 7 months of 2017. These news sources are carefully chosen to include well-established and mainstream sources, maliciously fake sources, satire sources, and hyper-partisan political blogs. In addition to each article we compute 130 content-based and social media engagement features drawn from a wide range of literature on political bias, persuasion, and misinformation. With the release of the data set, we also provide the source code for feature computation. In this paper, we discuss the first release of the data set and demonstrate 4 use cases of the data and features: news characterization, engagement characterization, news attribution and content copying, and discovering news narratives.

READ FULL TEXT
research
05/27/2020

The POLUSA Dataset: 0.9M Political News Articles Balanced by Time and Outlet Popularity

News articles covering policy issues are an essential source of informat...
research
08/01/2019

Auditing News Curation Systems: A Case Study Examining Algorithmic and Editorial Logic in Apple News

This work presents an audit study of Apple News as a sociotechnical news...
research
05/17/2017

Learning to Identify Ambiguous and Misleading News Headlines

Accuracy is one of the basic principles of journalism. However, it is in...
research
01/14/2023

Unveiling the Hidden Agenda: Biases in News Reporting and Consumption

One of the most pressing challenges in the digital media landscape is un...
research
01/02/2018

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

How should one perform matching in observational studies when the units ...
research
06/04/2020

NewB: 200,000+ Sentences for Political Bias Detection

We present the Newspaper Bias Dataset (NewB), a text corpus of more than...
research
09/17/2020

Understanding Effects of Editing Tweets for News Sharing by Media Accounts through a Causal Inference Framework

To reach a broader audience and optimize traffic toward news articles, m...

Please sign up or login with your details

Forgot password? Click here to reset