Massively Parallel Cross-Lingual Learning in Low-Resource Target Language Translation
We work on translation from rich-resource languages to low-resource languages. The main challenges we identify are the lack of low-resource language data, effective methods for cross-lingual transfer, and the variable-binding problem that is common in neural systems. We build a translation system that addresses these challenges using eight European language families as our test ground. Firstly, we add source and target family labels and study intra- family and inter-family influences for effective cross-lingual transfer. We achieve improvement of +8.4 BLEU score compared to single-family multi-source multi- target NMT baseline. We find that training two neighboring families closest to the low-resource language is often enough. Secondly, we construct an ablation study and find that reasonably good results can be achieved even with considerably less target data. Thirdly, we address the variable-binding problem by building an order-preserving named entity translation model. We obtain 60.6 translations are akin to human translations in a preliminary study.
READ FULL TEXT