CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction

04/08/2022
by   Meisin Lee, et al.
0

In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news were used as the adjudicated reference test set for inter-annotator and system evaluation. Agreement was generally substantial and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k events annotated. As part of active learning process, the corpus was used to train basic event extraction models for machine labeling, the resulting models also serve as a validation or as a pilot study demonstrating the use of the corpus in machine learning purposes. The annotated corpus is made available for academic research purpose at https://github.com/meisin/CrudeOilNews-Corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2020

Cross-context News Corpus for Protest Events related Knowledge Base Construction

We describe a gold standard corpus of protest events that comprise of va...
research
09/18/2017

Towards Building a Knowledge Base of Monetary Transactions from a News Collection

We address the problem of extracting structured representations of econo...
research
12/14/2022

Quotations, Coreference Resolution, and Sentiment Annotations in Croatian News Articles: An Exploratory Study

This paper presents a corpus annotated for the task of direct-speech ext...
research
09/19/2023

FRACAS: A FRench Annotated Corpus of Attribution relations in newS

Quotation extraction is a widely useful task both from a sociological an...
research
01/15/2022

Extracting Space Situational Awareness Events from News Text

Space situational awareness typically makes use of physical measurements...
research
04/06/2020

An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines

The extraction of anglicisms (lexical borrowings from English) is releva...
research
01/09/2020

Domain-independent Extraction of Scientific Concepts from Research Articles

We examine the novel task of domain-independent scientific concept extra...

Please sign up or login with your details

Forgot password? Click here to reset