MLQA: Evaluating Cross-lingual Extractive Question Answering

10/16/2019
by   Patrick Lewis, et al.
0

Question answering (QA) models have shown rapid progress enabled by the availability of large, high-quality benchmark datasets. Such annotated datasets are difficult and costly to collect, and rarely exist in languages other than English, making training QA systems in other languages challenging. An alternative to building large monolingual training datasets is to develop cross-lingual systems which can transfer to a target language without requiring training data in that language. In order to develop such systems, it is crucial to invest in high quality multilingual evaluation benchmarks to measure progress. We present MLQA, a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area. MLQA contains QA instances in 7 languages, namely English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. It consists of over 12K QA instances in English and 5K in each other language, with each QA instance being parallel between 4 languages on average. MLQA is built using a novel alignment context strategy on Wikipedia articles, and serves as a cross-lingual extension to existing extractive QA datasets. We evaluate current state-of-the-art cross-lingual representations on MLQA, and also provide machine-translation-based baselines. In all cases, transfer results are shown to be significantly behind training-language performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

XOR QA: Cross-lingual Open-Retrieval Question Answering

Multilingual question answering tasks typically assume answers exist in ...
research
04/24/2023

PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale

Existing question answering (QA) systems owe much of their success to la...
research
07/13/2019

Cross-Lingual Transfer Learning for Question Answering

Deep learning based question answering (QA) on English documents has ach...
research
07/05/2022

Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in Icelandic

It can be challenging to build effective open question answering (open Q...
research
04/06/2023

Bridging the Language Gap: Knowledge Injected Multilingual Question Answering

Question Answering (QA) is the task of automatically answering questions...
research
11/15/2022

QAmeleon: Multilingual QA with Only 5 Examples

The availability of large, high-quality datasets has been one of the mai...
research
06/06/2019

Cross-Lingual Training for Automatic Question Generation

Automatic question generation (QG) is a challenging problem in natural l...

Please sign up or login with your details

Forgot password? Click here to reset