SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

by   Zhen Wang, et al.

Unstructured clinical texts contain rich health-related information. To better utilize the knowledge buried in clinical texts, discovering synonyms for a medical query term has become an important task. Recent automatic synonym discovery methods leveraging raw text information have been developed. However, to preserve patient privacy and security, it is usually quite difficult to get access to large-scale raw clinical texts. In this paper, we study a new setting named synonym discovery on privacy-aware clinical data (i.e., medical terms extracted from the clinical texts and their aggregated co-occurrence counts, without raw clinical texts). To solve the problem, we propose a new framework SurfCon that leverages two important types of information in the privacy-aware clinical data, i.e., the surface form information, and the global context information for synonym discovery. In particular, the surface form module enables us to detect synonyms that look similar while the global context module plays a complementary role to discover synonyms that are semantically similar but in different surface forms, and both allow us to deal with the OOV query issue (i.e., when the query is not found in the given data). We conduct extensive experiments and case studies on publicly available privacy-aware clinical data, and show that SurfCon can outperform strong baseline methods by large margins under various settings.


page 1

page 2

page 3

page 4


Unsupervised Extraction of Phenotypes from Cancer Clinical Notes for Association Studies

The recent adoption of Electronic Health Records (EHRs) by health care p...

CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering

Clinical question answering (QA) aims to automatically answer questions ...

LLM for Patient-Trial Matching: Privacy-Aware Data Augmentation Towards Better Performance and Generalizability

The process of matching patients with suitable clinical trials is essent...

Structured Knowledge Discovery from Massive Text Corpus

Nowadays, with the booming development of the Internet, people benefit f...

Clinical Text Generation through Leveraging Medical Concept and Relations

With a neural sequence generation model, this study aims to develop a me...

Paragraph-level Simplification of Medical Texts

We consider the problem of learning to simplify medical texts. This is i...

De-Identification of French Unstructured Clinical Notes for Machine Learning Tasks

Unstructured textual data are at the heart of health systems: liaison le...

Please sign up or login with your details

Forgot password? Click here to reset