Neural Machine Translation System of Indic Languages – An Attention based Approach

02/02/2020 ∙ by Parth Shah, et al. ∙ UTU Yahoo! Inc. 0

Neural machine translation (NMT) is a recent and effective technique which led to remarkable improvements in comparison of conventional machine translation techniques. Proposed neural machine translation model developed for the Gujarati language contains encoder-decoder with attention mechanism. In India, almost all the languages are originated from their ancestral language - Sanskrit. They are having inevitable similarities including lexical and named entity similarity. Translating into Indic languages is always be a challenging task. In this paper, we have presented the neural machine translation system (NMT) that can efficiently translate Indic languages like Hindi and Gujarati that together covers more than 58.49 percentage of total speakers in the country. We have compared the performance of our NMT model with automatic evaluation matrices such as BLEU, perplexity and TER matrix. The comparison of our network with Google translate is also presented where it outperformed with a margin of 6 BLEU score on English-Gujarati translation.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

India is a highly diverse multilingual country in the world. In India, people of different regions use their own regional speaking languages, which makes India a country having world’s second highest number of languages. Human spoken languages in India belongs to several language families. Two main of those families are typically known as Indo-Aryan languages having 78.05 percentage Indian speakers [10] and Dravidian languages having 19.64 [10] percentage Indian speakers. Hindi and Gujarati are among constitutional languages of India having nearly 601,688,479 [10] Indian speakers almost 59 [10] percentage of total country population. Constitute of India under Article 343 offers English as second additional official language having only 226,449 [10] Indian speakers and nearly 0.02 percentages of total country population [10]. Communication and information exchange among people is necessary for sharing knowledge, feelings, opinions, facts, and thoughts. Variation of English is used globally for human communication. The content available on the Internet is exceptionally dominated by English. Only 20 percent of the world population speaks in English, while in India it is only 0.02 [10]. It is not possible to have a human translator in the country having this much language diversity. In order to bridge this vast language gap we need effective and accurate computational approaches, which require minimum human intervention. This task can be effectively done using machine translation.

Machine Translation (MT) is described as a task of computationally translate human spoken or natural language text or speech from one language to another with minimum human intervention. Machine translation aims to generate translations which have the same meaning as a source sentence and grammatically correct in the target language. Initial work on MT started in early 1950s [13], and has advanced rapidly since the 1990s due to the availability of more computational capacity and training data. Then after, number of approaches have been proposed to achieve more and more accurate machine translation as, Rule-based translation, Knowledge-based translation, Corpus-based translation, Hybrid translation, and Statistical machine translation(SMT) [13]. All the approaches have their own merits and demerits. Among these, SMT which is a subcategory of Corpus based translation, is widely used as it is able to produce better results compared to other previously available techniques. The usage of the Neural networks in machine translation become popular in recent years around the globe and the novel technique of machine translation with the usage of neural network is known as Neural Machine Translation or NMT. In recent years, many works has been carried out on NMT. Little has been done on Indian languages as well [13]. We found the NMT approach on Indic languages is still a challenging task, especially on bilingual machine translation.

In our past research work, we have worked on sequence-to-sequence model based machine translation system for Hindi language[12]. In this work, we have improved that model and applied for English-Gujarati language pair. We have developed a system that uses neural model based on Attention mechanism. Our proposed attention based NMT model is tested with evaluation matrices as BLEU, perplexity and TER.

In section 2 overview of related work carried out in the domain of machine translation is described in brief, section 3 gives fundamentals of machine translation process with neural network using attention mechanism, section 4 gives a comparative analysis of various automatic evaluation matrices, section 5 introduce the proposed bilingual neural machine translation models, section 6 shows the implementation and generated results with our attention based NMT model is shown in section 7, conclusion of the paper is presented in section 8.

Ii Related work

The process of translating text from source language to target language automatically with machine without any external human intervention is generally referred as Machine Translation(MT). It will basically convert sequence of words from source language to another sequence of words in target language without altering meaning of source words. Initial work in the field of machine translation was conceived by researchers at IBM research laboratory in the early ’50s. They have also provided a successful demonstration in 1956 for machine translation system[13]. But soon automatic language processing advisory committee of American government reported that machine translation task is infeasible to scale due to the amount of resource it requires. A new breakthrough in machine translation came only after 1979 where domain-specific translation system was implemented for weather bulletin translation from English to French[5] [2]. In the year 1991, researchers from IIT Kanpur has developed Angla Bharati-I machine translation system [14][3]. It was a general purpose translation system with domain customization. It is specifically designed for translating English to Hindi. In the year of 1999, CDAC developed a machine translation system named MANTRA [14], that uses the transfer-based machine translation. The system is developed for working on English-Gujarati, English-Hindi, English-Bengali and English-Telugu data pairs. Later the system is upgraded to AnglaBharati-II [14][3] using a hybrid approach of machine translation in 2004. In AnglaBharati-II, the efficiency of the system is improved compared to AnglaBharati-I.

Iii Machine Translation

Machine translation can be stated as the process of translating source language into target language considering the grammatical structure of the source language. The 1990s was marked as the breakthrough of a fairly new approaches to challenge and eventually improve the already established methodologies. This approach of machine translation was based on generating insights from large amount of available parallel corpuses. Example based Machine Translation was first proposed in 1981, but was developed from about 1990 onwards [7]. The core idea is to reuse existing translations for generating a new translation[16].

Iii-a Statistical Machine Translation

Statistics based approach for machine translation does not utilize any traditional linguistic data. It basically works on the principle of probability. Here, the word in a source language corresponds to other similar word(s) in the given target language. However it requires a large corpus of reliable translations consisting in both source and target language sentences. This approach is similar to the methods of the IBM research group, which had initial success for speech recognition and Machine Translation in the early 1990s


Iii-B Rule-based Machine Translation

Normally all the languages used by humans for communication consist of certain amount of grammatical rules. If we are able to model these rules into a system, we can generate the natural fluent sentences in target language. Rule-based machine translation system tries to model the same approach for machine translation by mapping source and target language sentences using necessary rules. However to translate Indian languages large number of rules with different context are required [8].

Iii-C Phrase-based Machine Translation

A phrase is a small group of words which have some special meaning. Phrase-based machine translation system contains a phrase table, which has a list of translated sentences between source and target language. In addition to that, it is having information about how we can rearrange translation of multiple phrases to generate a meaningful target language sentence. But, these types of machine translation systems were unable to produce human-like natural language sentences as it is not possible to have all combination of different phrase every time in model[8].

Iii-D Neural Machine Translation

Neural Machine Translation is one of the most recent approaches of machine translation that use a neural network based on the conditional probability of translating a given source language input to a given target language output as shown in Figure 1. NMT is more appealing as it requires less knowledge related to the structure of source as well as target language. It has outperformed traditional MT models in large-scale translation tasks such as English to German and English to French [19]. In recent years various architectures are proposed to achieve neural network based machine translation such as, simple encoder-decoder based model, RNN based model and LSTM model that learn problems with long-range temporal dependencies and the most advanced neural model for machine translation is Attention mechanism-based model.

Fig. 1: Converting source language into target language using sequence to sequence model[17]

Recurrent models typically factor computation along the symbol positions of the input and output sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden states , as a function of the previous hidden state and the input for position [18]. This inherently sequential nature of RNN makes impossible to apply parallelization within training examples. But for longer sequence lengths, it becomes critical as memory constraints limits batching across examples[6]. One of the major drawback of models that works on sequence-to-sequence model is that it is not able to generate words that are rarely encountered in input corpus. For solving this problem, attention mechanism can be applied in traditional sequence-to-sequence model. It allows modeling of dependencies without regard to their distance in the input or output.

Fig. 2: Basic structure of attention mechanism[17]

The concept of “attention” has gained popularity recently in training of neural networks, allowing models to learn alignments between different modalities, e.g., between image objects and agent actions in the dynamic control problem [6]. As shown in Figure 2, it also provides context which will become helpful for generating more natural looking sentences including rare words. Recently, attentional NMT models have dominated the field of machine translation. They are pushing the boundary of translation performance by continuing new development in NMT architectures.

Iv Evaluation Matrices

We can compare the performance of any machine translation model by comparing it across various evaluation matrices. In this paper, the following evaluation matrices are used for estimating the performance of our model.

Iv-a Translation error rate

Translation error rate or TER measures the amount of editing it requires to match the human-generated output. It was designed for evaluating the output of machine translation avoiding the knowledge intensiveness of meaning-based approaches. This method provides more meaningful insights when there is a large number of reference sentences available in the dataset. We can find TER of any translated sentences using the following equation [15]:


Iv-B Perplexity Matrix

Perplexity is a measure of language model performance based on average probability. Perplexity can be defined as the inverse probability of the sentences available in test data, normalized by the number of words in generated sentences. It can be calculated using following equation [4]:


Iv-C Bleu

BLEU uses the basic concepts of n-gram precision to calculate similarity between reference and generated sentence. It correlates highly with human expert review as it uses the average score of all result in test dataset rather than providing result of each sentence. BLEU score can be computed using the following equation



V Proposed System

Fig. 3: Proposed system using Attention Mechanism

As shown in Figure 3

, our proposed model is divided into mainly three different parts. Encoder, Decoder and Attention mechanism. Our encoder has two LSTM layers with 128 units of LSTM cells. This encoder will output encoded word embedding vector. This embedding vector is provided as input to decoder. Decoder is also consist of two LSTM layers with 128 units of lstm cells. It will take encoded vector and produce the output using beam search method. Whenever any output is produced the value of hidden state is compared with all input states to derive weights for attention mechanism. Based on attention weights, context vector is calculated and it is given as additional input to decoder for generating context relevant translation based on previous outcomes.

Vi Implementation

Vi-a Datasets

In order to work with neural networks we require large amount of training data. As neural networks are learning with experience, more the experience accurate the learning is. Wide range of work has been carried out for non Indian languages. So enough amount of parallel corpus is available like English-French, English German, etc. But on Indian languages most of corpus was available only for English-Hindi language pair. The only dataset available for Gujarati language is OPUS[11]

, which is a collection of translated texts from user manual of the open source software. So in order to create machine translation system that works on conversational level we have created our new dataset. The created

”eng_guj_parallel_corpus” contains nearly 65000 sentences in parallel format. We have also made it available for all researchers as open source dataset and can be downloaded from It is collection of sentences describing the activity or scenes in both Gujarati and English language.

Vi-B Experiment Setup

For our experiment we have used Google Cloud’s n1-highmem-2 instance with Intel Xeon E5 processor, 13 GB of primary memory and Tesla K80(2496 CUDA Core) GPU with 12GB of GPU memory. For creating and training deep neural networks TensorFlow deep learning library is used


Vii Results and Discussion

Vii-a Results

In our experiment we have trained our proposed neural machine translation model using ”eng_guj_parallel_corpus”

with 37000 epoch. Some of the results for proposed model is given in following Figure

4 and 5 :

Fig. 4: Input data
Fig. 5: Generated output data

As seen in figures, in most of the cases our model produces comparable result with human translator. Result for BLEU score for our model and Google’s Neural Machine Translation is compared in table I:

Evaluation Matrix Proposed Model GNMT
BLEU 40.33 33.66
TER 0.3913 0.5217
Perplexity 2.37 -
TABLE I: Various evaluation matrix comparison of models

Viii Conclusion

The conventional machine translation approaches are fast and efficient enough in processing. They have been proven significant in delivering good accuracy with their limited scope of application. But, they are facing difficulties in generating a target sentence or corpus with human-like fluency. Neural machine translation has played a significant role to outperformed difficulties associated with conventional machine translation approaches. However, the NMT models widely used in recent years like Seq-2-Seq has given great accuracy in generating fluent target language. Though, on some real environmental situations, specifically in the case when the model is coming across rare words the accuracy is decreased dramatically. To overcome this limitation of recent NMT models, an Attention mechanism is equipped in the model that has improved the accuracy. We have achieved an average BLEU score of 59.73 on training corpus and 40.33 on test corpus from parallel English-Gujarati corpus having 65,000 sentences.


  • [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. (2016)

    Tensorflow: large-scale machine learning on heterogeneous distributed systems

    arXiv preprint arXiv:1603.04467. Cited by: §VI-B.
  • [2] J. Durand, P. Bennett, V. Allegranza, F. van Eynde, L. Humphreys, P. Schmidt, and E. Steiner (1991) The eurotra linguistic specifications: an overview. Machine Translation 6 (2), pp. 103–147. External Links: Document, ISSN 1573-0573, Link Cited by: §II.
  • [3] S. K. Dwivedi and P. P. Sukhadeve (2010) Machine translation system in indian perspectives. Journal of computer science 6 (10), pp. 1111. Cited by: §II.
  • [4] F. Jelinek, R. L. Mercer, L. R. Bahl, and J. K. Baker (1977) Perplexity a measure of the difficulty of speech recognition tasks. The Journal of the Acoustical Society of America 62 (S1), pp. S63–S63. Cited by: §IV-B.
  • [5] V. Lawson (Ed.) (1982) Practical experience of machine translation. North-Holland Publishing Company. Cited by: §II.
  • [6] M. Luong, H. Pham, and C. D. Manning (2015) Effective approaches to attention-based neural machine translation. CoRR abs/1508.04025. External Links: 1508.04025, Link Cited by: §III-D, §III-D.
  • [7] R. Mitkov (2005) The oxford handbook of computational linguistics. Oxford University Press. Cited by: §III-A, §III.
  • [8] Naskar, Sudip, and S. Bandyopadhyay (2005) Use of machine translation in india: current status. In Proceedings of MT SUMMIT X, Phuket, Thailand, pp. 13–15. Cited by: §III-B, §III-C.
  • [9] K. Papineni, S. Roukos, T. Ward, and W. Zhu (2002) BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Cited by: §IV-C.
  • [10] I. Registrar General & Census Commissioner (2011) Abstract of speakers strength of languages and mother tongues 2011. Cited by: §I.
  • [11] J. rg Tiedemann (23-25) Parallel data, tools and interfaces in opus. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), N. C. (. Chair), K. Choukri, T. Declerck, M. U. Dogan, B. Maegaard, J. Mariani, J. Odijk, and S. Piperidis (Eds.), Istanbul, Turkey (english). External Links: ISBN 978-2-9517408-7-7 Cited by: §VI-A.
  • [12] P. Shah, V. Bakarola, and S. Pati (2018) Neural machine translation system for indic languages using deep neural architecture. In Smart and Innovative Trends in Next Generation Computing Technologies, P. Bhattacharyya, H. G. Sastry, V. Marriboyina, and R. Sharma (Eds.), Singapore, pp. 788–795. External Links: ISBN 978-981-10-8657-1 Cited by: §I.
  • [13] P. Sheridan (1955) Research in language translation on the ibm type 701. IBM Technical Newsletter 9, pp. 5–24. Cited by: §I, §II.
  • [14] S. B. Sitender (2012) Survey of indian machine translation systems. IJCST 3 (1). Cited by: §II.
  • [15] M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul A study of translation edit rate with targeted human annotation. Cited by: §IV-A.
  • [16] H. Somers Machine translation: latest developments. In The Oxford handbook of computational linguistics, Cited by: §III.
  • [17] I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 3104–3112. External Links: Link Cited by: Fig. 1, Fig. 2.
  • [18] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008. Cited by: §III-D.
  • [19] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144. External Links: Link Cited by: §III-D.