Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks

10/18/2022
by   Jaehoon Oh, et al.
0

Translation has played a crucial role in improving the performance on multilingual tasks: (1) to generate the target language data from the source language data for training and (2) to generate the source language data from the target language data for inference. However, prior works have not considered the use of both translations simultaneously. This paper shows that combining them can synergize the results on various multilingual sentence classification tasks. We empirically find that translation artifacts stylized by translators are the main factor of the performance gain. Based on this analysis, we adopt two training methods, SupCon and MixUp, considering translation artifacts. Furthermore, we propose a cross-lingual fine-tuning algorithm called MUSC, which uses SupCon and MixUp jointly and improves the performance. Our code is available at https://github.com/jongwooko/MUSC.

READ FULL TEXT
research
05/14/2021

A cost-benefit analysis of cross-lingual transfer methods

An effective method for cross-lingual transfer is to fine-tune a bilingu...
research
05/16/2022

Towards Debiasing Translation Artifacts

Cross-lingual natural language processing relies on translation, either ...
research
12/31/2020

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Multilingual machine translation enables a single model to translate bet...
research
10/29/2021

Handshakes AI Research at CASE 2021 Task 1: Exploring different approaches for multilingual tasks

The aim of the CASE 2021 Shared Task 1 (Hürriyetoğlu et al., 2021) was t...
research
05/16/2023

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Recent studies have shown that dual encoder models trained with the sent...
research
10/16/2020

It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

Recent works have demonstrated that multilingual BERT (mBERT) learns ric...
research
10/13/2022

A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models

Recent work on tokenizer-free multilingual pretrained models show promis...

Please sign up or login with your details

Forgot password? Click here to reset