Creating a Real-Time, Reproducible Event Dataset

12/02/2016
by   John Beieler, et al.
0

The generation of political event data has remained much the same since the mid-1990s, both in terms of data acquisition and the process of coding text into data. Since the 1990s, however, there have been significant improvements in open-source natural language processing software and in the availability of digitized news content. This paper presents a new, next-generation event dataset, named Phoenix, that builds from these and other advances. This dataset includes improvements in the underlying news collection process and event coding software, along with the creation of a general processing pipeline necessary to produce daily-updated data. This paper provides a face validity checks by briefly examining the data for the conflict in Syria, and a comparison between Phoenix and the Integrated Crisis Early Warning System data.

READ FULL TEXT

page 25

page 31

page 33

page 34

page 36

page 37

page 38

page 40

research
09/28/2021

Chekhov's Gun Recognition

Chekhov's gun is a dramatic principle stating that every element in a st...
research
05/03/2021

Russian News Clustering and Headline Selection Shared Task

This paper presents the results of the Russian News Clustering and Headl...
research
02/07/2023

Natural Language Processing for Policymaking

Language is the medium for many political activities, from campaigns to ...
research
07/06/2023

Enhancing LLM with Evolutionary Fine Tuning for News Summary Generation

News summary generation is an important task in the field of intelligenc...
research
09/27/2021

News Consumption in Time of Conflict: 2021 Palestinian-Israel War as an Example

This paper examines news consumption in response to a major polarizing e...

Please sign up or login with your details

Forgot password? Click here to reset