Incorporating Dictionaries into a Neural Network Architecture to Extract COVID-19 Medical Concepts From Social Media

by   Abul Hasan, et al.

We investigate the potential benefit of incorporating dictionary information into a neural network architecture for natural language processing. In particular, we make use of this architecture to extract several concepts related to COVID-19 from an on-line medical forum. We use a sample from the forum to manually curate one dictionary for each concept. In addition, we use MetaMap, which is a tool for extracting biomedical concepts, to identify a small number of semantic concepts. For a supervised concept extraction task on the forum data, our best model achieved a macro F_1 score of 90%. A major difficulty in medical concept extraction is obtaining labelled data from which to build supervised models. We investigate the utility of our models to transfer to data derived from a different source in two ways. First for producing labels via weak learning and second to perform concept extraction. The dataset we use in this case comprises COVID-19 related tweets and we achieve an F_1 score 81% for symptom concept extraction trained on weakly labelled data. The utility of our dictionaries is compared with a COVID-19 symptom dictionary that was constructed directly from Twitter. Further experiments that incorporate BERT and a COVID-19 version of BERTweet demonstrate that the dictionaries provide a commensurate result. Our results show that incorporating small domain dictionaries to deep learning models can improve concept extraction tasks. Moreover, models built using dictionaries generalize well and are transferable to different datasets on a similar task.


Triage and diagnosis of COVID-19 from medical social media

Objective: This study aims to develop an end-to-end natural language pro...

Concept Extraction to Identify Adverse Drug Reactions in Medical Forums: A Comparison of Algorithms

Social media is becoming an increasingly important source of information...

Extracting Concepts for Precision Oncology from the Biomedical Literature

This paper describes an initial dataset and automatic natural language p...

Simpler handling of clinical concepts in R with clinconcept

Routinely collected data in electronic healthcare records are often unde...

Extracting Bilingual Persian Italian Lexicon from Comparable Corpora Using Different Types of Seed Dictionaries

Bilingual dictionaries are very important in various fields of natural l...

TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak Supervision

Social media is often utilized as a lifeline for communication during na...

Dictionary-Assisted Supervised Contrastive Learning

Text analysis in the social sciences often involves using specialized di...

Please sign up or login with your details

Forgot password? Click here to reset