Tigrinya Neural Machine Translation with Transfer Learning for Humanitarian Response

03/09/2020 ∙ by Alp Öktem, et al. ∙ 0

We report our experiments in building a domain-specific Tigrinya-to-English neural machine translation system. We use transfer learning from other Ge'ez script languages and report an improvement of 1.3 BLEU points over a classic neural baseline. We publish our development pipeline as an open-source library and also provide a demonstration application.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Tigrinya (also spelled Tigrigna) is an Ethiopic language spoken by around 7.9 million people in Eritrea and Ethiopia. It is neither supported by any commercial machine translation (MT) provider, nor has any publicly available models. Refugees who speak Tigrinya face language and communication barriers when arriving in Europe. An MT system could improve access to information and enable two-way communication so refugees have a voice and can share their needs.

The complex morphological structure of Tigrinya makes it especially challenging for statistical MT (Tedla and Yamamoto, 2016; Teferra Abate et al., 2018). Neural MT, on the other hand, can overcome these problems with methods like subword segmentation (Sennrich et al., 2016) and lead to more accurate models (Kalchbrenner and Blunsom, 2013; Bahdanau et al., 2015). Known to be a data-hungry technology, it is now possible to train neural-based MT for Tigrinya thanks to recently released public datasets (Agić and Vulić, 2019; Teferra Abate et al., 2018). An advantage of neural MT is the availability of techniques like cross-lingual transfer learning (Zoph et al., 2016) and multilingual training (Dong et al., 2015) which help leverage data from other languages and are especially suitable in low-resource scenarios (Neubig and Hu, 2018).

In this paper, we explain the development of a Tigrinya-to-English neural MT model using publicly available datasets in Ge’ez-scripted languages. Our models are further adapted to the humanitarian domain to improve the translation capabilities of Translators without Borders (TWB), a non-profit organization offering language and translation support for humanitarian and development agencies, and other non-profit organizations.

2 Experiments

2.1 Data

We gathered an internal dataset from sentences in TWB’s translation memories. This dataset is both used for in-domain training and for testing. Two hundred sentences of varying lengths were selected randomly as a test set. This and other sources of public parallel corpora used in this work are listed in Table 1.

JW300* Ethiopian Bible- Global GNOME* Tanzil* TWB TOTAL
corpus uedin* Voices*
Amharic 722K 66K 61K 1,6K 57K 94K - 1M
Ge’ez - 11K - - - - - 11K
Tigrinya 400K 36K - - - - 2.5K 439K
Table 1: Parallel corpora used in this work. Dataset names marked with an asterisk are available through OPUS repository (Tiedemann, 23-25; Christodouloupoulos and Steedman, 2015). Ethiopian languages corpus (Teferra Abate et al., 2018) is also openly available online00footnotemark: 0.

2.2 Experimental setup

The transfer-learning-based training process consists of three stages. First, we train the model on a shuffled mix of all datasets totaling up to 1.45 million sentences. Second, we fine-tune the model on Tigrinya using only the Tigrinya portion of the mix (438,000 sentences). In the third phase, we fine-tune on the training partition of our in-house data (2,300 sentences). As a baseline, we skip the first multilingual training step and use only Tigrinya data. The model is later fine-tuned to in-domain data in the same way.

OpenNMT-py toolkit (Klein et al., 2018) is used for training the models. The model consists of an 8-head Transformer (Vaswani et al., 2017) with 6-layer hidden units of 512 unit size. A token-batch size of 4096, 2048 and 10 was selected for multilingual, unilingual and in-domain training respectively. As for the optimizer, Adam (Kingma and Ba, 2014) was chosen with 4000 warm-up steps. Trainings were performed until no further improvement was recorded in development set perplexity in the last 5 validations. This resulted in 73,500, 85,000 and 85,240 steps for each stage.

00footnotetext: http://github.com/AAUThematic4LT/Parallel-Corpora-for-Ethiopian-Languages

Byte-pair encoding (BPE) models were trained separately for Latin script and Ge’ez script using 6,000 steps. English sentences were lowercased and tokenized beforehand using Moses tokenizer (Koehn et al., 2007). Ge’ez-scripted sentences were tokenized using a punctuation separation script111http://github.com/translatorswb/mt-tools.

2.3 Results

We report our test set scores at each stage together with the baseline using various commonly used automatic evaluation metrics in Table

2. Results show an agreement between all evaluation measures on the boost obtained from multilingual pre-training. Accuracy increases of +1.3, +3.1 and +0.9 points are recorded using BLEU (Papineni et al., 2002), ChrF (Popović, 2015) and Meteor (Lavie and Agarwal, 2007) metrics respectively.


BLEU ChrF Meteor
Baseline 22.28 46.51 26.1
Multilingual 15.84 40.99 23.32
Tigrinya 17.8 42.92 24.61
In-domain 23.6 49.59 27.04
Table 2: Automatic evaluation results for baseline approach and at each stages of our training pipeline.

3 Conclusion

With this work, we have demonstrated the utility of cross-lingual transfer learning on building a Tigrinya-to-English MT system. As a result of this work, a demonstration application was launched as the first neural Tigrinya-to-English translator222http://gamayun.translatorswb.org/tigrinya. As for future work, we will develop English-to-Tigrinya models and evaluate the usability of the bidirectional system in a humanitarian setting using feedback from native speakers.


This work was done partially in collaboration with the Masakhane initiative333http://www.masakhane.io/. Special thanks to Musie Meressa Berhe for helping revise our dataset.


  • Ž. Agić and I. Vulić (2019) JW300: a wide-coverage parallel corpus for low-resource languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 3204–3210. Cited by: §1.
  • D. Bahdanau, K. Cho, and Y. Bengio (2015) Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), Cited by: §1.
  • C. Christodouloupoulos and M. Steedman (2015) A massively parallel corpus: the bible in 100 languages. Language Resources and Evaluation 49 (2), pp. 375–395. Cited by: Table 1.
  • D. Dong, H. Wu, W. He, D. Yu, and H. Wang (2015) Multi-task learning for multiple language translation. In

    Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

    Beijing, China, pp. 1723–1732. Cited by: §1.
  • N. Kalchbrenner and P. Blunsom (2013) Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, pp. 1700–1709. Cited by: §1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980. Cited by: §2.2.
  • G. Klein, Y. Kim, Y. Deng, V. Nguyen, J. Senellart, and A. Rush (2018) OpenNMT: neural machine translation toolkit. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers), Boston, MA, pp. 177–184. Cited by: §2.2.
  • P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst (2007) Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, Prague, Czech Republic, pp. 177–180. Cited by: §2.2.
  • A. Lavie and A. Agarwal (2007) Meteor: an automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT ’07, USA, pp. 228–231. Cited by: §2.3.
  • G. Neubig and J. Hu (2018) Rapid adaptation of neural machine translation to new languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 875–880. Cited by: §1.
  • K. Papineni, S. Roukos, T. Ward, and W. Zhu (2002) Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. Cited by: §2.3.
  • M. Popović (2015)

    ChrF: character n-gram F-score for automatic MT evaluation

    In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 392–395. Cited by: §2.3.
  • R. Sennrich, B. Haddow, and A. Birch (2016) Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1715–1725. Cited by: §1.
  • Y. Tedla and K. Yamamoto (2016) The effect of shallow segmentation on english-tigrinya statistical machine translation. 2016 International Conference on Asian Language Processing (IALP), pp. 79–82. Cited by: §1.
  • S. Teferra Abate, M. Melese, M. Yifiru Tachbelie, M. Meshesha, S. Atinafu, W. Mulugeta, Y. Assabie, H. Abera, B. Ephrem, T. Abebe, W. Tsegaye, A. Lemma, T. Andargie, and S. Shifaw (2018) Parallel corpora for bi-directional statistical machine translation for seven Ethiopian language pairs. In Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, Santa Fe, New Mexico, USA, pp. 83–90. Cited by: §1, Table 1.
  • J. Tiedemann (23-25) Parallel data, tools and interfaces in opus. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey (english). Cited by: Table 1.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.), pp. 5998–6008. Cited by: §2.2.
  • B. Zoph, D. Yuret, J. May, and K. Knight (2016) Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 1568–1575. Cited by: §1.