Urdu-English Machine Transliteration using Neural Networks

01/12/2020
by   Usman Mohy ud Din, et al.
0

Machine translation has gained much attention in recent years. It is a sub-field of computational linguistic which focus on translating text from one language to other language. Among different translation techniques, neural network currently leading the domain with its capabilities of providing a single large neural network with attention mechanism, sequence-to-sequence and long-short term modelling. Despite significant progress in domain of machine translation, translation of out-of-vocabulary words(OOV) which include technical terms, named-entities, foreign words are still a challenge for current state-of-art translation systems, and this situation becomes even worse while translating between low resource languages or languages having different structures. Due to morphological richness of a language, a word may have different meninges in different context. In such scenarios, translation of word is not only enough in order provide the correct/quality translation. Transliteration is a way to consider the context of word/sentence during translation. For low resource language like Urdu, it is very difficult to have/find parallel corpus for transliteration which is large enough to train the system. In this work, we presented transliteration technique based on Expectation Maximization (EM) which is un-supervised and language independent. Systems learns the pattern and out-of-vocabulary (OOV) words from parallel corpus and there is no need to train it on transliteration corpus explicitly. This approach is tested on three models of statistical machine translation (SMT) which include phrasebased, hierarchical phrase-based and factor based models and two models of neural machine translation which include LSTM and transformer model.

READ FULL TEXT
research
10/01/2019

Application of Low-resource Machine Translation Techniques to Russian-Tatar Language Pair

Neural machine translation is the current state-of-the-art in machine tr...
research
01/06/2019

Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

We address for the first time unsupervised training for a translation ta...
research
05/11/2020

Neural Polysynthetic Language Modelling

Research in natural language processing commonly assumes that approaches...
research
03/06/2020

Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning

Data-driven segmentation of words into subword units has been used in va...
research
02/05/2021

Spell Correction for Azerbaijani Language using Deep Neural Networks

Spell correction is used to detect and correct orthographic mistakes in ...
research
02/10/2017

Local System Voting Feature for Machine Translation System Combination

In this paper, we enhance the traditional confusion network system combi...
research
11/25/2016

Kannada Spell Checker with Sandhi Splitter

Spelling errors are introduced in text either during typing, or when the...

Please sign up or login with your details

Forgot password? Click here to reset