Self-supervised Answer Retrieval on Clinical Notes

by   Paul Grundmann, et al.

Retrieving answer passages from long documents is a complex task requiring semantic understanding of both discourse and document context. We approach this challenge specifically in a clinical scenario, where doctors retrieve cohorts of patients based on diagnoses and other latent medical aspects. We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. In addition, we contribute a novel retrieval dataset based on clinical notes to simulate this scenario on a large corpus of clinical notes. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. From our extensive evaluation on MIMIC-III and three other healthcare datasets, we report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages. This makes the model powerful especially in zero-shot scenarios where only limited training data is available.



There are no comments yet.


page 4


Learning Contextualized Document Representations for Healthcare Answer Retrieval

We present Contextual Discourse Vectors (CDV), a distributed document re...

InPars: Data Augmentation for Information Retrieval using Large Language Models

The information retrieval community has recently witnessed a revolution ...

Hierarchical Transformer Networks for Longitudinal Clinical Document Classification

We present the Hierarchical Transformer Networks for modeling long-term ...

Estimating Redundancy in Clinical Text

The current mode of use of Electronic Health Record (EHR) elicits text r...

Embedding Electronic Health Records for Clinical Information Retrieval

Neural network representation learning frameworks have recently shown to...

Toward Understanding Clinical Context of Medication Change Events in Clinical Narratives

Understanding medication events in clinical narratives is essential to a...

CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes

Continuity of care is crucial to ensuring positive health outcomes for p...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.