What Do Patients Say About Their Disease Symptoms? Deep Multilabel Text Classification With Human-in-the-Loop Curation for Automatic Labeling of Patient Self Reports of Problem

05/08/2023
by   Lakshmi Arbatti, et al.
0

The USA Food and Drug Administration has accorded increasing importance to patient-reported problems in clinical and research settings. In this paper, we explore one of the largest online datasets comprising 170,141 open-ended self-reported responses (called "verbatims") from patients with Parkinson's (PwPs) to questions about what bothers them about their Parkinson's Disease and how it affects their daily functioning, also known as the Parkinson's Disease Patient Report of Problems. Classifying such verbatims into multiple clinically relevant symptom categories is an important problem and requires multiple steps - expert curation, a multi-label text classification (MLTC) approach and large amounts of labelled training data. Further, human annotation of such large datasets is tedious and expensive. We present a novel solution to this problem where we build a baseline dataset using 2,341 (of the 170,141) verbatims annotated by nine curators including clinical experts and PwPs. We develop a rules based linguistic-dictionary using NLP techniques and graph database-based expert phrase-query system to scale the annotation to the remaining cohort generating the machine annotated dataset, and finally build a Keras-Tensorflow based MLTC model for both datasets. The machine annotated model significantly outperforms the baseline model with a F1-score of 95 categories on a held-out test set.

READ FULL TEXT
research
01/27/2021

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Large datasets in NLP suffer from noisy labels, due to erroneous automat...
research
01/22/2019

PadChest: A large chest x-ray image dataset with multi-label annotated reports

We present a labeled large-scale, high resolution chest x-ray dataset fo...
research
11/22/2019

Classifying Vietnamese Disease Outbreak Reports with Important Sentences and Rich Features

Text classification is an important field of research from mid 90s up to...
research
03/29/2020

Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes

Machine learning-based multi-label medical text classifications can be u...
research
08/11/2023

Weakly Supervised Text Classification on Free Text Comments in Patient-Reported Outcome Measures

Free text comments (FTC) in patient-reported outcome measures (PROMs) da...
research
03/25/2017

Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

Objective: We investigate whether deep learning techniques for natural l...
research
09/25/2020

Database Annotation with few Examples: An Atlas-based Framework using Diffeomorphic Registration of 3D Trees

Automatic annotation of anatomical structures can help simplify workflow...

Please sign up or login with your details

Forgot password? Click here to reset