Transfer Learning across Languages from Someone Else's NMT Model

09/24/2019

∙

Neural machine translation is demanding in terms of training time, hardware resources, size, and quantity of parallel sentences. We propose a simple transfer learning method to recycle already trained models for different language pairs with no need for modifications in model architecture, hyper-parameters, or vocabulary. We achieve better translation quality and shorter convergence times than when training from random initialization. To show the applicability of our method, we recycle a Transformer model trained by different researchers for translating English-to-Czech and used it to seed models for seven language pairs. Our translation models are significantly better even when the re-used model's language pair is not linguistically related to the child language pair, especially for low-resource languages. Our approach needs only one pretrained model for all transferring to all various languages pairs. Additionally, we improve this approach with a simple vocabulary transformation. We analyze the behavior of transfer learning to understand the gains from unrelated languages.

READ FULL TEXT

Transfer Learning across Languages from Someone Else's NMT Model

Sign in with Google

Consider DeepAI Pro