The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation

03/20/2021
by   Jonne Sälevä, et al.
0

This paper evaluates the performance of several modern subword segmentation methods in a low-resource neural machine translation setting. We compare segmentations produced by applying BPE at the token or sentence level with morphologically-based segmentations from LMVR and MORSEL. We evaluate translation tasks between English and each of Nepali, Sinhala, and Kazakh, and predict that using morphologically-based segmentation methods would lead to better performance in this setting. However, comparing to BPE, we find that no consistent and reliable differences emerge between the segmentation methods. While morphologically-based methods outperform BPE in a few cases, what performs best tends to vary across tasks, and the performance of segmentation methods is often statistically indistinguishable.

READ FULL TEXT
research
09/26/2017

Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages

In machine translation, we often try to collect resources to improve its...
research
09/20/2020

Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation

Despite being the seventh most widely spoken language in the world, Beng...
research
01/02/2021

Decoding Time Lexical Domain Adaptationfor Neural Machine Translation

Machine translation systems are vulnerable to domain mismatch, especiall...
research
03/25/2022

Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation

Subword regularizations use multiple subword segmentations during traini...
research
03/20/2022

Small Batch Sizes Improve Training of Low-Resource Neural MT

We study the role of an essential hyper-parameter that governs the train...
research
10/11/2022

Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text

Data sparsity is one of the main challenges posed by Code-switching (CS)...
research
09/08/2022

Knowledge Based Template Machine Translation In Low-Resource Setting

Incorporating tagging into neural machine translation (NMT) systems has ...

Please sign up or login with your details

Forgot password? Click here to reset