Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts

04/10/2022
by   Saadullah Amin, et al.
0

Despite the advances in digital healthcare systems offering curated structured knowledge, much of the critical information still lies in large volumes of unlabeled and unstructured clinical texts. These texts, which often contain protected health information (PHI), are exposed to information extraction tools for downstream applications, risking patient identification. Existing works in de-identification rely on using large-scale annotated corpora in English, which often are not suitable in real-world multilingual settings. Pre-trained language models (LM) have shown great potential for cross-lingual transfer in low-resource settings. In this work, we empirically show the few-shot cross-lingual transfer property of LMs for named entity recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke domain. We annotate a gold evaluation dataset to assess few-shot setting performance where we only use a few hundred labeled examples for training. Our model improves the zero-shot F1-score from 73.7 evaluation set when adapting Multilingual BERT (mBERT) (Devlin et al., 2019) from the MEDDOCAN (Marimon et al., 2019) corpus with our few-shot cross-lingual target corpus. When generalized to an out-of-sample test set, the best model achieves a human-evaluation F1-score of 97.2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2022

Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT

Multilingual BERT (mBERT), a language model pre-trained on large multili...
research
04/30/2020

MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

The main goal behind state-of-the-art pretrained multilingual models suc...
research
06/07/2023

Multilingual Clinical NER: Translation or Cross-lingual Transfer?

Natural language tasks like Named Entity Recognition (NER) in the clinic...
research
09/01/2021

Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation

Recent multilingual pre-trained language models have achieved remarkable...
research
10/13/2022

CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing

A bottleneck to developing Semantic Parsing (SP) models is the need for ...
research
01/13/2021

Uzbek Cyrillic-Latin-Cyrillic Machine Transliteration

In this paper, we introduce a data-driven approach to transliterating Uz...
research
08/03/2022

Cross-Lingual Knowledge Transfer for Clinical Phenotyping

Clinical phenotyping enables the automatic extraction of clinical condit...

Please sign up or login with your details

Forgot password? Click here to reset