Transfer Learning across Languages from Someone Else's NMT Model
Neural machine translation is demanding in terms of training time, hardware resources, size, and quantity of parallel sentences. We propose a simple transfer learning method to recycle already trained models for different language pairs with no need for modifications in model architecture, hyper-parameters, or vocabulary. We achieve better translation quality and shorter convergence times than when training from random initialization. To show the applicability of our method, we recycle a Transformer model trained by different researchers for translating English-to-Czech and used it to seed models for seven language pairs. Our translation models are significantly better even when the re-used model's language pair is not linguistically related to the child language pair, especially for low-resource languages. Our approach needs only one pretrained model for all transferring to all various languages pairs. Additionally, we improve this approach with a simple vocabulary transformation. We analyze the behavior of transfer learning to understand the gains from unrelated languages.
READ FULL TEXT