XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

03/24/2020
by   Junjie Hu, et al.
0

Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence retrieval tasks. There is also a wide spread of results across languages. We release the benchmark to encourage research on cross-lingual learning methods that transfer linguistic knowledge across a diverse and representative set of languages and tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

How Do Multilingual Encoders Learn Cross-lingual Representation?

NLP systems typically require support for more than one language. As dif...
research
04/17/2021

AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples

Capturing word meaning in context and distinguishing between corresponde...
research
06/20/2023

On Evaluating Multilingual Compositional Generalization with Translated Datasets

Compositional generalization allows efficient learning and human-like in...
research
05/12/2021

Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer

There is an increasing amount of evidence that in cases with little or n...
research
11/29/2022

Compressing Cross-Lingual Multi-Task Models at Qualtrics

Experience management is an emerging business area where organizations f...
research
04/03/2023

ScandEval: A Benchmark for Scandinavian Natural Language Processing

This paper introduces a Scandinavian benchmarking platform, ScandEval, w...
research
04/15/2021

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

Machine learning has brought striking advances in multilingual natural l...

Please sign up or login with your details

Forgot password? Click here to reset