Detecting Adverse Drug Reactions from Twitter through Domain-Specific Preprocessing and BERT Ensembling
The automation of adverse drug reaction (ADR) detection in social media would revolutionize the practice of pharmacovigilance, supporting drug regulators, the pharmaceutical industry and the general public in ensuring the safety of the drugs prescribed in daily practice. Following from the published proceedings of the Social Media Mining for Health (SMM4H) Applications Workshop Shared Task in August 2019, we aimed to develop a deep learning model to classify ADRs within Twitter tweets that contain drug mentions. Our approach involved fine-tuning BERT_LARGE and two domain-specific BERT implementations, BioBERT and Bio + clinicalBERT, applying a domain-specific preprocessor, and developing a max-prediction ensembling approach. Our final model resulted in state-of-the-art performance on both F_1-score (0.6681) and recall (0.7700) outperforming all models submitted in SMM4H 2019 and during post-evaluation to date.
READ FULL TEXT