One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval

07/26/2021
by   Akari Asai, et al.
8

We present CORA, a Cross-lingual Open-Retrieval Answer Generation model that can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable. We introduce a new dense passage retrieval algorithm that is trained to retrieve documents across languages for a question. Combined with a multilingual autoregressive generation model, CORA answers directly in the target language without any translation or in-language retrieval modules as used in prior work. We propose an iterative training method that automatically extends annotated data available only in high-resource languages to low-resource ones. Our results show that CORA substantially outperforms the previous state of the art on multilingual open question answering benchmarks across 26 languages, 9 of which are unseen during training. Our analyses show the significance of cross-lingual retrieval and generation in many languages, particularly under low-resource settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

XOR QA: Cross-lingual Open-Retrieval Question Answering

Multilingual question answering tasks typically assume answers exist in ...
research
05/30/2022

ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System

This paper introduces our proposed system for the MIA Shared Task on Cro...
research
07/30/2020

MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

Progress in cross-lingual modeling depends on challenging, realistic, an...
research
05/21/2022

Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training

Keyphrase generation is the task of automatically predicting keyphrases ...
research
12/28/2020

Pivot Through English: Reliably Answering Multilingual Questions without Document Retrieval

Existing methods for open-retrieval question answering in lower resource...
research
04/05/2022

Towards Best Practices for Training Multilingual Dense Retrieval Models

Dense retrieval models using a transformer-based bi-encoder design have ...
research
08/05/2022

Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey

Dense retrieval (DR) approaches based on powerful pre-trained language m...

Please sign up or login with your details

Forgot password? Click here to reset