Revisiting Machine Translation for Cross-lingual Classification

05/23/2023
by   Mikel Artetxe, et al.
0

Machine Translation (MT) has been widely used for cross-lingual classification, either by translating the test set into English and running inference with a monolingual model (translate-test), or translating the training set into the target languages and finetuning a multilingual model (translate-train). However, most research in the area focuses on the multilingual models rather than the MT component. We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed. The optimal approach, however, is highly task dependent, as we identify various sources of cross-lingual transfer gap that affect different tasks and approaches differently. Our work calls into question the dominance of multilingual models for cross-lingual classification, and prompts to pay more attention to MT-based baselines.

READ FULL TEXT
research
04/09/2020

Translation Artifacts in Cross-lingual Transfer Learning

Both human and machine translation play a central role in cross-lingual ...
research
05/04/2023

Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages

Multilingual language models have shown impressive cross-lingual transfe...
research
04/18/2017

Baselines and test data for cross-lingual inference

Research in natural language inference is currently exclusive to English...
research
07/31/2017

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

Semantic Textual Similarity (STS) measures the meaning similarity of sen...
research
02/10/2022

Slovene SuperGLUE Benchmark: Translation and Evaluation

We present a Slovene combined machine-human translated SuperGLUE benchma...
research
05/30/2023

Translation-Enhanced Multilingual Text-to-Image Generation

Research on text-to-image generation (TTI) still predominantly focuses o...
research
02/26/2023

Cross-lingual Knowledge Transfer via Distillation for Multilingual Information Retrieval

In this paper, we introduce the approach behind our submission for the M...

Please sign up or login with your details

Forgot password? Click here to reset