Multilingual Clinical NER: Translation or Cross-lingual Transfer?

06/07/2023
by   Xavier Fontaine, et al.
0

Natural language tasks like Named Entity Recognition (NER) in the clinical domain on non-English texts can be very time-consuming and expensive due to the lack of annotated data. Cross-lingual transfer (CLT) is a way to circumvent this issue thanks to the ability of multilingual large language models to be fine-tuned on a specific task in one language and to provide high accuracy for the same task in another language. However, other methods leveraging translation models can be used to perform NER without annotated data in the target language, by either translating the training set or test set. This paper compares cross-lingual transfer with these two alternative methods, to perform clinical NER in French and in German without any training data in those languages. To this end, we release MedNERF a medical NER test set extracted from French drug prescriptions and annotated with the same guidelines as an English dataset. Through extensive experiments on this dataset and on a German medical dataset (Frei and Kramer, 2021), we show that translation-based methods can achieve similar performance to CLT but require more care in their design. And while they can take advantage of monolingual clinical language models, those do not guarantee better results than large general-purpose multilingual models, whether with cross-lingual transfer or translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation

Named entity recognition (NER) suffers from the scarcity of annotated tr...
research
08/03/2022

Cross-Lingual Knowledge Transfer for Clinical Phenotyping

Clinical phenotyping enables the automatic extraction of clinical condit...
research
05/24/2021

DaN+: Danish Nested Named Entities and Lexical Normalization

This paper introduces DaN+, a new multi-domain corpus and annotation gui...
research
01/25/2023

Cross-lingual Argument Mining in the Medical Domain

Nowadays the medical domain is receiving more and more attention in appl...
research
06/03/2023

Impact of translation on biomedical information extraction from real-life clinical notes

The objective of our study is to determine whether using English tools t...
research
04/10/2022

Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts

Despite the advances in digital healthcare systems offering curated stru...
research
11/28/2022

Frustratingly Easy Label Projection for Cross-lingual Transfer

Translating training data into many languages has emerged as a practical...

Please sign up or login with your details

Forgot password? Click here to reset