Generalizing through Forgetting – Domain Generalization for Symptom Event Extraction in Clinical Notes

09/20/2022
by   Sitong Zhou, et al.
0

Symptom information is primarily documented in free-text clinical notes and is not directly accessible for downstream applications. To address this challenge, information extraction approaches that can handle clinical language variation across different institutions and specialties are needed. In this paper, we present domain generalization for symptom extraction using pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population. We extract symptom events using a transformer-based joint entity and relation extraction method. To reduce reliance on domain-specific features, we propose a domain generalization method that dynamically masks frequent symptoms words in the source domain. Additionally, we pretrain the transformer language model (LM) on task-related unlabeled texts for better representation. Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2019

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling

Contextualized word embeddings such as ELMo and BERT provide a foundatio...
research
12/05/2022

Cross-Domain Few-Shot Relation Extraction via Representation Learning and Domain Adaptation

Cross-domain few-shot relation extraction poses a great challenge for th...
research
04/04/2019

Unsupervised Domain Adaptation of Contextualized Embeddings: A Case Study in Early Modern English

Contextualized word embeddings such as ELMo and BERT provide a foundatio...
research
08/21/2020

Adapting Event Extractors to Medical Data: Bridging the Covariate Shift

We tackle the task of adapting event extractors to new domains without l...
research
06/08/2019

Clinical Concept Extraction for Document-Level Coding

The text of clinical notes can be a valuable source of patient informati...
research
11/21/2022

CBEAF-Adapting: Enhanced Continual Pretraining for Building Chinese Biomedical Language Model

Continual pretraining is a standard way of building a domain-specific pr...
research
05/19/2023

Eye-SpatialNet: Spatial Information Extraction from Ophthalmology Notes

We introduce an annotated corpus of 600 ophthalmology notes labeled with...

Please sign up or login with your details

Forgot password? Click here to reset