Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora

06/25/2018
by   Sree Harsha Ramesh, et al.
0

Resources for the non-English languages are scarce and this paper addresses this problem in the context of machine translation, by automatically extracting parallel sentence pairs from the multilingual articles available on the Internet. In this paper, we have used an end-to-end Siamese bidirectional recurrent neural network to generate parallel sentences from comparable multilingual articles in Wikipedia. Subsequently, we have showed that using the harvested dataset improved BLEU scores on both NMT and phrase-based SMT systems for the low-resource language pairs: English--Hindi and English--Tamil, when compared to training exclusively on the limited bilingual corpora collected for these language pairs.

READ FULL TEXT
research
09/27/2022

Improving Multilingual Neural Machine Translation System for Indic Languages

Machine Translation System (MTS) serves as an effective tool for communi...
research
07/10/2019

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

We present an approach based on multilingual sentence embeddings to auto...
research
10/05/2021

Sicilian Translator: A Recipe for Low-Resource NMT

With 17,000 pairs of Sicilian-English translated sentences, Arba Sicula ...
research
07/15/2020

A Multilingual Parallel Corpora Collection Effort for Indian Languages

We present sentence aligned parallel corpora across 10 Indian Languages ...
research
06/13/2018

Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation

Parallel sentence extraction is a task addressing the data sparsity prob...
research
10/09/2020

ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization

Cherokee is a highly endangered Native American language spoken by the C...
research
09/28/2017

A Deep Neural Network Approach To Parallel Sentence Extraction

Parallel sentence extraction is a task addressing the data sparsity prob...

Please sign up or login with your details

Forgot password? Click here to reset