MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

10/12/2022
by   Sunjae Kwon, et al.
0

This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences (MedJ). Then, we introduce a novel medical jargon extraction (MedJEx) model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2021

Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT

The availability of biomedical text data and advances in natural languag...
research
01/06/2019

Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks

Neural networks (NNs) have become the state of the art in many machine l...
research
12/23/2021

Towards more patient friendly clinical notes through language models and ontologies

Clinical notes are an efficient way to record patient information but ar...
research
07/05/2023

ODD: A Benchmark Dataset for the NLP-based Opioid Related Aberrant Behavior Detection

Opioid related aberrant behaviors (ORAB) present novel risk factors for ...
research
10/01/2020

An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

Pre-training large language models has become a standard in the natural ...
research
05/03/2018

Scalable Semantic Querying of Text

We present the KOKO system that takes declarative information extraction...
research
11/14/2016

Ranking medical jargon in electronic health record notes by adapted distant supervision

Objective: Allowing patients to access their own electronic health recor...

Please sign up or login with your details

Forgot password? Click here to reset