BAND: Biomedical Alert News Dataset

05/23/2023
by   Zihao Fu, et al.
0

Infectious disease outbreaks continue to pose a significant threat to human health and well-being. To improve disease surveillance and understanding of disease spread, several surveillance systems have been developed to monitor daily news alerts and social media. However, existing systems lack thorough epidemiological analysis in relation to corresponding alerts or news, largely due to the scarcity of well-annotated reports data. To address this gap, we introduce the Biomedical Alert News Dataset (BAND), which includes 1,508 samples from existing reported news articles, open emails, and alerts, as well as 30 epidemiology-related questions. These questions necessitate the model's expert reasoning abilities, thereby offering valuable insights into the outbreak of the disease. The BAND dataset brings new challenges to the NLP world, requiring better disguise capability of the content and the ability to infer important information. We provide several benchmark tasks, including Named Entity Recognition (NER), Question Answering (QA), and Event Extraction (EE), to show how existing models are capable of handling these tasks in the epidemiology domain. To the best of our knowledge, the BAND corpus is the largest corpus of well-annotated biomedical outbreak alert news with elaborately designed questions, making it a valuable resource for epidemiologists and NLP researchers alike.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2019

A Finnish News Corpus for Named Entity Recognition

We present a corpus of Finnish news articles with a manually prepared na...
research
10/30/2017

Creation of an Annotated Corpus of Spanish Radiology Reports

This paper presents a new annotated corpus of 513 anonymized radiology r...
research
11/24/2021

Few-shot Named Entity Recognition with Cloze Questions

Despite the huge and continuous advances in computational linguistics, t...
research
11/21/2019

Global Health Monitor: A Web-based System for Detecting and Mapping Infectious Diseases

We present the Global Health Monitor, an online Web-based system for det...
research
09/04/2023

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

Many of the most commonly explored natural language processing (NLP) inf...
research
12/04/2021

A Russian Jeopardy! Data Set for Question-Answering Systems

Question answering (QA) is one of the most common NLP tasks that relates...
research
09/08/2021

ArchivalQA: A Large-scale Benchmark Dataset for Open Domain Question Answering over Archival News Collections

In the last few years, open-domain question answering (ODQA) has advance...

Please sign up or login with your details

Forgot password? Click here to reset