How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology?

09/02/2021
by   Chantal Amrhein, et al.
0

Data-driven subword segmentation has become the default strategy for open-vocabulary machine translation and other NLP tasks, but may not be sufficiently generic for optimal learning of non-concatenative morphology. We design a test suite to evaluate segmentation strategies on different types of morphological phenomena in a controlled, semi-synthetic setting. In our experiments, we compare how well machine translation models trained on subword- and character-level can translate these morphological phenomena. We find that learning to analyse and generate morphologically complex surface representations is still challenging, especially for non-concatenative morphological phenomena like reduplication or vowel harmony and for rare word stems. Based on our results, we recommend that novel text representation strategies be tested on a range of typologically diverse languages to minimise the risk of adopting a strategy that inadvertently disadvantages certain languages.

READ FULL TEXT

page 7

page 16

research
01/02/2020

Morphological Word Segmentation on Agglutinative Languages for Neural Machine Translation

Neural machine translation (NMT) has achieved impressive performance on ...
research
10/06/2020

An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks

Typically, tokenization is the very first step in most text processing w...
research
10/07/2016

Morphology Generation for Statistical Machine Translation using Deep Learning Techniques

Morphology in unbalanced languages remains a big challenge in the contex...
research
03/16/2022

BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Morphologically-rich polysynthetic languages present a challenge for NLP...
research
04/17/2018

Improving Character-based Decoding Using Target-Side Morphological Information for Neural Machine Translation

Recently, neural machine translation (NMT) has emerged as a powerful alt...
research
05/06/2022

Quantifying Synthesis and Fusion and their Impact on Machine Translation

Theoretical work in morphological typology offers the possibility of mea...
research
05/22/2023

Cross-functional Analysis of Generalisation in Behavioural Learning

In behavioural testing, system functionalities underrepresented in the s...

Please sign up or login with your details

Forgot password? Click here to reset