RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media

10/12/2022
by   Somin Wadhwa, et al.
0

We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from Reddit spanning 24 health conditions. Annotations include demarcations of spans corresponding to medical claims, personal experiences, and questions. We collect additional granular annotations on identified claims. Specifically, we mark snippets that describe patient Populations, Interventions, and Outcomes (PIO elements) within these. Using this corpus, we introduce the task of retrieving trustworthy evidence relevant to a given claim made on social media. We propose a new method to automatically derive (noisy) supervision for this task which we use to train a dense retrieval model; this outperforms baseline models. Manual evaluation of retrieval results performed by medical doctors indicate that while our system performance is promising, there is considerable room for improvement. Collected annotations (and scripts to assemble the dataset), are available at https://github.com/sominw/redhot.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2022

Multi-task Learning for Personal Health Mention Detection on Social Media

Detecting personal health mentions on social media is essential to compl...
research
07/11/2022

CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts

Research community has witnessed substantial growth in the detection of ...
research
06/11/2018

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature

We present a corpus of 5,000 richly annotated abstracts of medical artic...
research
02/17/2023

Med-EASi: Finely Annotated Dataset and Models for Controllable Simplification of Medical Texts

Automatic medical text simplification can assist providers with patient-...
research
10/13/2020

The workweek is the best time to start a family – A Study of GPT-2 Based Claim Generation

Argument generation is a challenging task whose research is timely consi...
research
08/24/2022

Ontology-Driven Self-Supervision for Adverse Childhood Experiences Identification Using Social Media Datasets

Adverse Childhood Experiences (ACEs) are defined as a collection of high...

Please sign up or login with your details

Forgot password? Click here to reset