A Multilingual Neural Machine Translation Model for Biomedical Data

08/06/2020
by   Alexandre Berard, et al.
0

We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain. The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English. It is trained with large amounts of generic and biomedical data, using domain tags. Our benchmarks show that it performs near state-of-the-art both on news (generic domain) and biomedical test sets, and that it outperforms the existing publicly released models. We believe that this release will help the large-scale multilingual analysis of the digital content of the COVID-19 crisis and of its effects on society, economy, and healthcare policies. We also release a test set of biomedical text for Korean-English. It consists of 758 sentences from official guidelines and recent papers, all about COVID-19.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2021

NVIDIA NeMo Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21

This paper provides an overview of NVIDIA NeMo's neural machine translat...
research
03/19/2018

English-Catalan Neural Machine Translation in the Biomedical Domain through the cascade approach

This paper describes the methodology followed to build a neural machine ...
research
04/07/2020

Multilingual enrichment of disease biomedical ontologies

Translating biomedical ontologies is an important challenge, but doing i...
research
05/01/2020

Facilitating Access to Multilingual COVID-19 Information via Neural Machine Translation

Every day, more people are becoming infected and dying from exposure to ...
research
10/11/2022

Enriching Biomedical Knowledge for Low-resource Language Through Translation

Biomedical data and benchmarks are highly valuable yet very limited in l...
research
05/23/2023

Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation

Sanskrit is a low-resource language with a rich heritage. Digitized Sans...
research
12/20/2022

Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models

In the era of digital healthcare, the huge volumes of textual informatio...

Please sign up or login with your details

Forgot password? Click here to reset