Quantifying Synthesis and Fusion and their Impact on Machine Translation

05/06/2022
by   Arturo Oncevay, et al.
0

Theoretical work in morphological typology offers the possibility of measuring morphological diversity on a continuous scale. However, literature in Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative. In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level. We consider Payne (2017)'s approach to classify morphology using two indices: synthesis (e.g. analytic to polysynthetic) and fusion (agglutinative to fusional). For computing synthesis, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study. Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish, and verbs in English-Spanish) and segment level (previous language pairs plus English-German in both directions). We complement the word-level analysis with human evaluation, and overall, we observe a consistent impact of both indexes on machine translation quality.

READ FULL TEXT
research
03/13/2015

An implementation of Apertium based Assamese morphological analyzer

Morphological Analysis is an important branch of linguistics for any Nat...
research
09/16/2017

Role of Morphology Injection in Statistical Machine Translation

Phrase-based Statistical models are more commonly used as they perform o...
research
07/10/2020

Pragmatic information in translation: a corpus-based study of tense and mood in English and German

Grammatical tense and mood are important linguistic phenomena to conside...
research
03/16/2022

BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Morphologically-rich polysynthetic languages present a challenge for NLP...
research
09/02/2021

How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology?

Data-driven subword segmentation has become the default strategy for ope...
research
05/12/2020

Reassessing Claims of Human Parity and Super-Human Performance in Machine Translation at WMT 2019

We reassess the claims of human parity and super-human performance made ...
research
12/03/2014

Mary Astell's words in A Serious Proposal to the Ladies (part I), a lexicographic inquiry with NooJ

In the following article we elected to study with NooJ the lexis of a 17...

Please sign up or login with your details

Forgot password? Click here to reset