Automatically Detecting Self-Reported Birth Defect Outcomes on Twitter for Large-scale Epidemiological Research

10/22/2018
by   Ari Z. Klein, et al.
0

In recent work, we identified and studied a small cohort of Twitter users whose pregnancies with birth defect outcomes could be observed via their publicly available tweets. Exploiting social media's large-scale potential to complement the limited methods for studying birth defects, the leading cause of infant mortality, depends on the further development of automatic methods. The primary objective of this study was to take the first step towards scaling the use of social media for observing pregnancies with birth defect outcomes, namely, developing methods for automatically detecting tweets by users reporting their birth defect outcomes. We annotated and pre-processed approximately 23,000 tweets that mention birth defects in order to train and evaluate supervised machine learning algorithms, including feature-engineered and deep learning-based classifiers. We also experimented with various under-sampling and over-sampling approaches to address the class imbalance. A Support Vector Machine (SVM) classifier trained on the original, imbalanced data set, with n-grams, word clusters, and structural features, achieved the best baseline performance for the positive classes: an F1-score of 0.65 for the "defect" class and 0.51 for the "possible defect" class. Our contributions include (i) natural language processing (NLP) and supervised machine learning methods for automatically detecting tweets by users reporting their birth defect outcomes, (ii) a comparison of feature-engineered and deep learning-based classifiers trained on imbalanced, under-sampled, and over-sampled data, and (iii) an error analysis that could inform classification improvements using our publicly available corpus. Future work will focus on automating user-level analyses for cohort inclusion.

READ FULL TEXT
research
08/16/2019

Automatically Identifying Comparator Groups on Twitter for Digital Epidemiology of Pregnancy Outcomes

Despite the prevalence of adverse pregnancy outcomes such as miscarriage...
research
01/28/2017

Feature Studies to Inform the Classification of Depressive Symptoms from Twitter Data for Population Health

The utility of Twitter data as a medium to support population-level ment...
research
03/10/2021

ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets

Advancing the utility of social media data for research applications req...
research
02/21/2022

Items from Psychometric Tests as Training Data for Personality Profiling Models of Twitter Users

Machine-learned models for author profiling in social media often rely o...
research
09/15/2018

Inferring Political Alignments of Twitter Users: A case study on 2017 Turkish constitutional referendum

Increasing popularity of Twitter in politics is subject to commercial an...
research
11/01/2021

Identifying causal associations in tweets using deep learning: Use case on diabetes-related tweets from 2017-2021

Objective: Leveraging machine learning methods, we aim to extract both e...
research
08/03/2021

Predicting Zip Code-Level Vaccine Hesitancy in US Metropolitan Areas Using Machine Learning Models on Public Tweets

Although the recent rise and uptake of COVID-19 vaccines in the United S...

Please sign up or login with your details

Forgot password? Click here to reset