Enriching Biomedical Knowledge for Low-resource Language Through Translation

10/11/2022
by   Long Phan, et al.
0

Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2020

The Ubiqus English-Inuktitut System for WMT20

This paper describes Ubiqus' submission to the WMT20 English-Inuktitut s...
research
06/06/2021

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

One of the biggest challenges hindering progress in low-resource and mul...
research
04/21/2021

Should we Stop Training More Monolingual Models, and Simply Use Machine Translation Instead?

Most work in NLP makes the assumption that it is desirable to develop so...
research
11/14/2018

The ADAPT System Description for the IWSLT 2018 Basque to English Translation Task

In this paper we present the ADAPT system built for the Basque to Englis...
research
10/08/2020

Query-Key Normalization for Transformers

Low-resource language translation is a challenging but socially valuable...
research
08/06/2020

A Multilingual Neural Machine Translation Model for Biomedical Data

We release a multilingual neural machine translation model, which can be...
research
12/16/2020

MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification

We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeN...

Please sign up or login with your details

Forgot password? Click here to reset