Role of Morphology Injection in Statistical Machine Translation

09/16/2017
by   Sreelekha S, et al.
0

Phrase-based Statistical models are more commonly used as they perform optimally in terms of both, translation quality and complexity of the system. Hindi and in general all Indian languages are morphologically richer than English. Hence, even though Phrase-based systems perform very well for the less divergent language pairs, for English to Indian language translation, we need more linguistic information (such as morphology, parse tree, parts of speech tags, etc.) on the source side. Factored models seem to be useful in this case, as Factored models consider word as a vector of factors. These factors can contain any information about the surface word and use it while translating. Hence, the objective of this work is to handle morphological inflections in Hindi and Marathi using Factored translation models while translating from English. SMT approaches face the problem of data sparsity while translating into a morphologically rich language. It is very unlikely for a parallel corpus to contain all morphological forms of words. We propose a solution to generate these unseen morphological forms and inject them into original training corpora. In this paper, we study factored models and the problem of sparseness in context of translation to morphologically rich languages. We propose a simple and effective solution which is based on enriching the input with various morphological forms of words. We observe that morphology injection improves the quality of translation in terms of both adequacy and fluency. We verify this with the experiments on two morphologically rich languages: Hindi and Marathi, while translating from English.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2017

Morphology Generation for Statistical Machine Translation

When translating into morphologically rich languages, Statistical MT app...
research
09/12/2016

Morphological Constraints for Phrase Pivot Statistical Machine Translation

The lack of parallel data for many language pairs is an important challe...
research
08/13/2018

Comparing morphological complexity of Spanish, Otomi and Nahuatl

We use two small parallel corpora for comparing the morphological comple...
research
05/06/2022

Quantifying Synthesis and Fusion and their Impact on Machine Translation

Theoretical work in morphological typology offers the possibility of mea...
research
09/06/2019

Don't Forget the Long Tail! A Comprehensive Analysis of Morphological Generalization in Bilingual Lexicon Induction

Human translators routinely have to translate rare inflections of words ...
research
03/25/2022

Modeling Target-Side Morphology in Neural Machine Translation: A Comparison of Strategies

Morphologically rich languages pose difficulties to machine translation....
research
08/18/2015

Probabilistic Modelling of Morphologically Rich Languages

This thesis investigates how the sub-structure of words can be accounted...

Please sign up or login with your details

Forgot password? Click here to reset