Statistical Machine Translation for Indic Languages

01/02/2023
by   Sudhansu Bala Das, et al.
0

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2016

Statistical Machine Translation for Indian Languages: Mission Hindi

This paper discusses Centre for Development of Advanced Computing Mumbai...
research
10/25/2016

Statistical Machine Translation for Indian Languages: Mission Hindi 2

This paper presents Centre for Development of Advanced Computing Mumbai'...
research
10/01/2017

Robust Tuning Datasets for Statistical Machine Translation

We explore the idea of automatically crafting a tuning dataset for Stati...
research
10/24/2016

Reordering rules for English-Hindi SMT

Reordering is a preprocessing stage for Statistical Machine Translation ...
research
10/05/2017

Indowordnets help in Indian Language Machine Translation

Being less resource languages, Indian-Indian and English-Indian language...
research
05/06/2019

English-Bhojpuri SMT System: Insights from the Karaka Model

This thesis has been divided into six chapters namely: Introduction, Kar...
research
04/01/2020

Igbo-English Machine Translation: An Evaluation Benchmark

Although researchers and practitioners are pushing the boundaries and en...

Please sign up or login with your details

Forgot password? Click here to reset