OSLAT: Open Set Label Attention Transformer for Medical Entity Span Extraction

07/12/2022
by   Raymond Li, et al.
38

Identifying spans in medical texts that correspond to medical entities is one of the core steps for many healthcare NLP tasks such as ICD coding, medical finding extraction, medical note contextualization, to name a few. Existing entity extraction methods rely on a fixed and limited vocabulary of medical entities and have difficulty with extracting entities represented by disjoint spans. In this paper, we present a new transformer-based architecture called OSLAT, Open Set Label Attention Transformer, that addresses many of the limitations of the previous methods. Our approach uses the label-attention mechanism to implicitly learn spans associated with entities of interest. These entities can be provided as free text, including entities not seen during OSLAT's training, and the model can extract spans even when they are disjoint. To test the generalizability of our method, we train two separate models on two different datasets, which have very low entity overlap: (1) a public discharge notes dataset from hNLP, and (2) a much more challenging proprietary patient text dataset "Reasons for Encounter" (RFE). We find that OSLAT models trained on either dataset outperform rule-based and fuzzy string matching baselines when applied to the RFE dataset as well as to the portion of hNLP dataset where entities are represented by disjoint spans. Our code can be found at https://github.com/curai/curai-research/tree/main/OSLAT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2019

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

Extracting entity from images is a crucial part of many OCR applications...
research
10/02/2020

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

Entity representations are useful in natural language tasks involving en...
research
11/16/2022

UniRel: Unified Representation and Interaction for Joint Relational Triple Extraction

Relational triple extraction is challenging for its difficulty in captur...
research
01/25/2022

Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality

Medical entity extraction (EE) is a standard procedure used as a first s...
research
04/27/2023

ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue System Development

Existing medical text datasets usually take the form of ques- tion and a...
research
11/15/2020

To Schedule or not to Schedule: Extracting Task Specific Temporal Entities and Associated Negation Constraints

State of the art research for date-time entity extraction from text is t...
research
08/15/2022

Entity Anchored ICD Coding

Medical coding is a complex task, requiring assignment of a subset of ov...

Please sign up or login with your details

Forgot password? Click here to reset