Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages
In machine translation, we often try to collect resources to improve its performance. However, most of the language pairs don't have enough resources to train machine translation systems. In this paper, we propose to use synthetic methods for extending a low resource corpus and apply it to a multi source neural machine translation model. We showed the improvement of machine translation performance through the corpus extension using the synthetic method. Especially, we focused on how to create source sentences that can make better target sentences, even using synthetic methods. And we found that the corpus extension could also improve the performance of a multi source neural machine translation. We showed the corpus extension and multi source model to be an efficient method for a low-resource language pair. Furthermore, when both methods were used together, we found better machine translation performance.
READ FULL TEXT