Model Selection for Cross-Lingual Transfer using a Learned Scoring Function

10/13/2020
by   Yang Chen, et al.
0

Transformers that are pre-trained on multilingual text corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer learning results. In the zero-shot cross-lingual transfer setting, only English training data is assumed, and the fine-tuned model is evaluated on another target language. No target-language validation data is assumed in this setting, however substantial variance has been observed in target language performance between different fine-tuning runs. Prior work has relied on English validation/development data to select among models that are fine-tuned with different learning rates, number of steps and other hyperparameters, often resulting in suboptimal choices. To address this challenge, we propose a meta-learning approach to model selection that uses the fine-tuned model's own internal representations to predict its cross-lingual capabilities. In extensive experiments we find that our approach consistently selects better models than English validation data across five languages and five well-studied NLP tasks, achieving results that are comparable to small amounts of target language development data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

On the Evaluation of Contextual Embeddings for Zero-Shot Cross-Lingual Transfer Learning

Pre-trained multilingual contextual embeddings have demonstrated state-o...
research
05/26/2023

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

Massively multilingual language models have displayed strong performance...
research
09/07/2022

Improving the Cross-Lingual Generalisation in Visual Question Answering

While several benefits were realized for multilingual vision-language pr...
research
05/25/2022

Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation

In this paper, we explore the challenging problem of performing a genera...
research
05/23/2022

Cross-lingual Lifelong Learning

The longstanding goal of multi-lingual learning has been to develop a un...
research
05/23/2023

Towards Massively Multi-domain Multilingual Readability Assessment

We present ReadMe++, a massively multi-domain multilingual dataset for a...
research
10/28/2022

Stanceosaurus: Classifying Stance Towards Multilingual Misinformation

We present Stanceosaurus, a new corpus of 28,033 tweets in English, Hind...

Please sign up or login with your details

Forgot password? Click here to reset