Modeling "Newsworthiness" for Lead-Generation Across Corpora

04/19/2021
by   Alexander Spangher, et al.
5

Journalists obtain "leads", or story ideas, by reading large corpora of government records: court cases, proposed bills, etc. However, only a small percentage of such records are interesting documents. We propose a model of "newsworthiness" aimed at surfacing interesting documents. We train models on automatically labeled corpora – published newspaper articles – to predict whether each article was a front-page article (i.e., newsworthy) or not (i.e., less newsworthy). We transfer these models to unlabeled corpora – court cases, bills, city-council meeting minutes – to rank documents in these corpora on "newsworthiness". A fine-tuned RoBERTa model achieves .93 AUC performance on heldout labeled documents, and .88 AUC on expert-validated unlabeled corpora. We provide interpretation and visualization for our models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2020

OpenFraming: We brought the ML; you bring the data. Interact with your data and discover its frames

When journalists cover a news story, they can cover the story from multi...
research
05/23/2022

Seeded Hierarchical Clustering for Expert-Crafted Taxonomies

Practitioners from many disciplines (e.g., political science) use expert...
research
05/03/2023

DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents

Language identification describes the task of recognizing the language o...
research
10/13/2020

Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora

What kind of basic research ideas are more likely to get applied in prac...
research
03/29/2022

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

Training a text-to-speech (TTS) model requires a large scale text labele...
research
10/11/2022

HUE: Pretrained Model and Dataset for Understanding Hanja Documents of Ancient Korea

Historical records in Korea before the 20th century were primarily writt...

Please sign up or login with your details

Forgot password? Click here to reset