Eliciting Disease Data from Wikipedia Articles

04/02/2015
by   Geoffrey Fairchild, et al.
0

Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content. We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series data that are consistently updated that closely align with ground truth data. We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2020

A marginal moment matching approach for fitting endemic-epidemic models to underreported disease surveillance counts

Count data are often subject to underreporting, especially in infectious...
research
01/10/2019

Multivariate endemic-epidemic models with higher-order lags and an application to outbreak detection

Multivariate time series models are an important tool for the analysis o...
research
09/14/2019

Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag Set

Wikipedia is a great source of general world knowledge which can guide N...
research
11/02/2021

Quality change: norm or exception? Measurement, Analysis and Detection of Quality Change in Wikipedia

Wikipedia has been turned into an immensely popular crowd-sourced encycl...
research
08/09/2019

Crowdsourcing real-time viral disease and pest information. A case of nation-wide cassava disease surveillance in a developing country

In most developing countries, a huge proportion of the national food bas...
research
03/04/2019

Early Detection of Influenza outbreaks in the United States

Public health surveillance systems often fail to detect emerging infecti...
research
08/10/2020

A robust and non-parametric model for prediction of dengue incidence

Disease surveillance is essential not only for the prior detection of ou...

Please sign up or login with your details

Forgot password? Click here to reset