TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

03/10/2020
by   Jonathan H. Clark, et al.
0

Confidently making progress on multilingual modeling requires challenging, trustworthy evaluations. We present TyDi QA—a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology—the set of linguistic features each language expresses—such that we expect models performing well on this set to generalize across a large number of the world's languages. We present a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora. To provide a realistic information-seeking task and avoid priming effects, questions are written by people who want to know the answer, but don't know the answer yet, and the data is collected directly in each language without the use of translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2020

MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

Progress in cross-lingual modeling depends on challenging, realistic, an...
research
10/27/2021

Can Linguistic Distance help Language Classification? Assessing Hawrami-Zaza and Kurmanji-Sorani

To consider Hawrami and Zaza (Zazaki) standalone languages or dialects o...
research
05/25/2022

Investigating Information Inconsistency in Multilingual Open-Domain Question Answering

Retrieval based open-domain QA systems use retrieved documents and answe...
research
12/06/2022

Dataset vs Reality: Understanding Model Performance from the Perspective of Information Need

Deep learning technologies have brought us many models that outperform h...
research
07/08/2018

A Deep Generative Model of Vowel Formant Typology

What makes some types of languages more probable than others? For instan...
research
12/20/2022

(QA)^2: Question Answering with Questionable Assumptions

Naturally-occurring information-seeking questions often contain question...
research
09/13/2021

SituatedQA: Incorporating Extra-Linguistic Contexts into QA

Answers to the same question may change depending on the extra-linguisti...

Please sign up or login with your details

Forgot password? Click here to reset