A Benchmark Dataset for Understandable Medical Language Translation

12/04/2020
by   Junyu Luo, et al.
11

In this paper, we introduce MedLane – a new human-annotated Medical Language translation dataset, to align professional medical sentences with layperson-understandable expressions. The dataset contains 12,801 training samples, 1,015 validation samples, and 1,016 testing samples. We then evaluate one naive and six deep learning-based approaches on the MedLane dataset, including directly copying, a statistical machine translation approach Moses, four neural machine translation approaches (i.e., the proposed PMBERT-MT model, Seq2Seq and its two variants), and a modified text summarization model PointerNet. To compare the results, we utilize eleven metrics, including three new measures specifically designed for this task. Finally, we discuss the limitations of MedLane and baselines, and point out possible research directions for this task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2018

Machine Translation : From Statistical to modern Deep-learning practices

Machine translation (MT) is an area of study in Natural Language process...
research
09/29/2015

Neural-based machine translation for medical text domain. Based on European Medicines Agency leaflet texts

The quality of machine translation is rapidly evolving. Today one can fi...
research
10/20/2020

Towards End-to-End In-Image Neural Machine Translation

In this paper, we offer a preliminary investigation into the task of in-...
research
08/19/2018

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

This paper describes SentencePiece, a language-independent subword token...
research
04/08/2022

PharmMT: A Neural Machine Translation Approach to Simplify Prescription Directions

The language used by physicians and health professionals in prescription...
research
09/20/2023

SignBank+: Multilingual Sign Language Translation Dataset

This work advances the field of sign language machine translation by foc...
research
09/30/2022

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation

With the recent advance in neural machine translation demonstrating its ...

Please sign up or login with your details

Forgot password? Click here to reset