Morphology Generation for Statistical Machine Translation

10/05/2017
by   Sreelekha S, et al.
0

When translating into morphologically rich languages, Statistical MT approaches face the problem of data sparsity. The severity of the sparseness problem will be high when the corpus size of morphologically richer language is less. Even though we can use factored models to correctly generate morphological forms of words, the problem of data sparseness limits their performance. In this paper, we describe a simple and effective solution which is based on enriching the input corpora with various morphological forms of words. We use this method with the phrase-based and factor-based experiments on two morphologically rich languages: Hindi and Marathi when translating from English. We evaluate the performance of our experiments both in terms automatic evaluation and subjective evaluation such as adequacy and fluency. We observe that the morphology injection method helps in improving the quality of translation. We further analyze that the morph injection method helps in handling the data sparseness problem to a great level.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2017

Role of Morphology Injection in Statistical Machine Translation

Phrase-based Statistical models are more commonly used as they perform o...
research
08/15/2019

What's Wrong with Hebrew NLP? And How to Make it Right

For languages with simple morphology, such as English, automatic annotat...
research
09/12/2016

Morphological Constraints for Phrase Pivot Statistical Machine Translation

The lack of parallel data for many language pairs is an important challe...
research
03/25/2022

Modeling Target-Side Morphology in Neural Machine Translation: A Comparison of Strategies

Morphologically rich languages pose difficulties to machine translation....
research
09/06/2019

Don't Forget the Long Tail! A Comprehensive Analysis of Morphological Generalization in Bilingual Lexicon Induction

Human translators routinely have to translate rare inflections of words ...
research
10/07/2016

Morphology Generation for Statistical Machine Translation using Deep Learning Techniques

Morphology in unbalanced languages remains a big challenge in the contex...
research
10/11/2022

Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text

Data sparsity is one of the main challenges posed by Code-switching (CS)...

Please sign up or login with your details

Forgot password? Click here to reset