Models for Predicting Community-Specific Interest in News Articles

08/27/2018
by   Benjamin D. Horne, et al.
0

In this work, we ask two questions: 1. Can we predict the type of community interested in a news article using only features from the article content? and 2. How well do these models generalize over time? To answer these questions, we compute well-studied content-based features on over 60K news articles from 4 communities on reddit.com. We train and test models over three different time periods between 2015 and 2017 to demonstrate which features degrade in performance the most due to concept drift. Our models can classify news articles into communities with high accuracy, ranging from 0.81 ROC AUC to 1.0 ROC AUC. However, while we can predict the community-specific popularity of news articles with high accuracy, practitioners should approach these models carefully. Predictions are both community-pair dependent and feature group dependent. Moreover, these feature groups generalize over time differently, with some only degrading slightly over time, but others degrading greatly. Therefore, we recommend that community-interest predictions are done in a hierarchical structure, where multiple binary classifiers can be used to separate community pairs, rather than a traditional multi-class model. Second, these models should be retrained over time based on accuracy goals and the availability of training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2022

MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification

This article presents a dataset of 10,917 news articles with hierarchica...
research
04/02/2019

Different Spirals of Sameness: A Study of Content Sharing in Mainstream and Alternative Media

In this paper, we analyze content sharing between news sources in the al...
research
03/16/2022

NELA-Local: A Dataset of U.S. Local News Articles for the Study of County-level News Ecosystems

In this paper, we present a dataset of over 1.4M online news articles fr...
research
07/17/2018

To Post or Not to Post: Using Online Trends to Predict Popularity of Offline Content

Predicting the popularity of online content has attracted much attention...
research
05/15/2018

An Exploration of Verbatim Content Republishing by News Producers

In today's news ecosystem, news sources emerge frequently and can vary w...
research
08/26/2019

Detecting Toxicity in News Articles: Application to Bulgarian

Online media aim for reaching ever bigger audience and for attracting ev...
research
02/15/2018

The Causal Link between News Framing and Legislation

We demonstrate that framing, a subjective aspect of news, is a causal pr...

Please sign up or login with your details

Forgot password? Click here to reset