Utilizing Language Relatedness to improve Machine Translation: A Case Study on Languages of the Indian Subcontinent

03/19/2020
by   Anoop Kunchukuttan, et al.
0

In this work, we present an extensive study of statistical machine translation involving languages of the Indian subcontinent. These languages are related by genetic and contact relationships. We describe the similarities between Indic languages arising from these relationships. We explore how lexical and orthographic similarity among these languages can be utilized to improve translation quality between Indic languages when limited parallel corpora is available. We also explore how the structural correspondence between Indic languages can be utilized to re-use linguistic resources for English to Indic language translation. Our observations span 90 language pairs from 9 Indic languages and English. To the best of our knowledge, this is the first large-scale study specifically devoted to utilizing language relatedness to improve translation between related languages.

READ FULL TEXT

page 5

page 11

page 12

page 13

research
10/25/2016

Statistical Machine Translation for Indian Languages: Mission Hindi 2

This paper presents Centre for Development of Advanced Computing Mumbai'...
research
01/16/2017

Machine Translation Approaches and Survey for Indian Languages

In this study, we present an analysis regarding the performance of the s...
research
06/06/2017

Acquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords

With the advent of informal electronic communications such as social med...
research
01/13/2015

Annotating Cognates and Etymological Origin in Turkic Languages

Turkic languages exhibit extensive and diverse etymological relationship...
research
02/16/2019

Exploring Language Similarities with Dimensionality Reduction Technique

In recent years several novel models were developed to process natural l...
research
05/22/2023

Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models

This study investigates machine translation between related languages i....
research
05/01/2023

Low-Resourced Machine Translation for Senegalese Wolof Language

Natural Language Processing (NLP) research has made great advancements i...

Please sign up or login with your details

Forgot password? Click here to reset