A latent shared-component generative model for real-time disease surveillance using Twitter data

Exploiting the large amount of available data for addressing relevant social problems has been one of the key challenges in data mining. Such efforts have been recently named "data science for social good" and attracted the attention of several researchers and institutions. We give a contribution in this objective in this paper considering a difficult public health problem, the timely monitoring of dengue epidemics in small geographical areas. We develop a generative simple yet effective model to connect the fluctuations of disease cases and disease-related Twitter posts. We considered a hidden Markov process driving both, the fluctuations in dengue reported cases and the tweets issued in each region. We add a stable but random source of tweets to represent the posts when no disease cases are recorded. The model is learned through a Markov chain Monte Carlo algorithm that produces the posterior distribution of the relevant parameters. Using data from a significant number of large Brazilian towns, we demonstrate empirically that our model is able to predict well the next weeks of the disease counts using the tweets and disease cases jointly.

READ FULL TEXT
research
08/21/2016

Mining of health and disease events on Twitter: validating search protocols within the setting of Indonesia

This study seeks to validate a search protocol of ill health-related ter...
research
05/01/2020

Early Outbreak Detection for Proactive Crisis Management Using Twitter Data: COVID-19 a Case Study in the US

During a disease outbreak, timely non-medical interventions are critical...
research
02/06/2019

Supervised learning improves disease outbreak detection

The early detection of infectious disease outbreaks is a crucial task to...
research
12/22/2021

Faster indicators of dengue fever case counts using Google and Twitter

Dengue is a major threat to public health in Brazil, the world's sixth b...
research
01/31/2022

Disaster Tweets Classification using BERT-Based Language Model

Social networking services have became an important communication channe...
research
06/09/2020

EPIC30M: An Epidemics Corpus Of Over 30 Million Relevant Tweets

Since the start of COVID-19, several relevant corpora from various sourc...
research
11/21/2016

Ontology Driven Disease Incidence Detection on Twitter

In this work we address the issue of generic automated disease incidence...

Please sign up or login with your details

Forgot password? Click here to reset