Evaluating robustness of language models for chief complaint extraction from patient-generated text

11/15/2019
by   Ilya Valmianski, et al.
0

Automated classification of chief complaints from patient-generated text is a critical first step in developing scalable platforms to triage patients without human intervention. In this work, we evaluate several approaches to chief complaint classification using a novel Chief Complaint (CC) Dataset that contains  200,000 patient-generated reasons-for-visit entries mapped to a set of 795 discrete chief complaints. We examine the use of several fine-tuned bidirectional transformer (BERT) models trained on both unrelated texts as well as on the CC dataset. We contrast this performance with a TF-IDF baseline. Our evaluation has three components: (1) a random test hold-out from the original dataset; (2) a "misspelling set," consisting of a hand-selected subset of the test set, where every entry has at least one misspelling; (3) a separate experimenter-generated free-text set. We find that the TF-IDF model performs significantly better than the strongest BERT-based model on the test (best BERT PR-AUC 0.3597 ± 0.0041 vs TF-IDF PR-AUC 0.3878 ± 0.0148, p=7· 10^-5), and is statistically comparable to the misspelling sets (best BERT PR-AUC 0.2579 ± 0.0079 vs TF-IDF PR-AUC 0.2733 ± 0.0130, p=0.06). However, when examining model predictions on experimenter-generated queries, some concerns arise about TF-IDF baseline's robustness. Our results suggest that in certain tasks, simple language embedding baselines may be very performant; however, truly understanding their robustness requires further analysis.

READ FULL TEXT
research
05/13/2023

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

This paper presents a novel approach for detecting ChatGPT-generated vs....
research
09/20/2023

CPLLM: Clinical Prediction with Large Language Models

We present Clinical Prediction with Large Language Models (CPLLM), a met...
research
07/19/2020

Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks

BERT (Bidirectional Encoder Representations from Transformers) and ALBER...
research
10/16/2020

Mischief: A Simple Black-Box Attack Against Transformer Architectures

We introduce Mischief, a simple and lightweight method to produce a clas...
research
11/08/2018

Deep Learning Predicts Hip Fracture using Confounding Patient and Healthcare Variables

Hip fractures are a leading cause of death and disability among older ad...
research
07/08/2022

ABB-BERT: A BERT model for disambiguating abbreviations and contractions

Abbreviations and contractions are commonly found in text across differe...
research
10/18/2021

EIHW-MTG: Second DiCOVA Challenge System Report

This work presents an outer product-based approach to fuse the embedded ...

Please sign up or login with your details

Forgot password? Click here to reset