Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering

12/11/2019
by   Casimiro Pio Carrino, et al.
0

Recently, multilingual question answering became a crucial research topic, and it is receiving increased interest in the NLP community. However, the unavailability of large-scale datasets makes it challenging to train multilingual QA systems with performance comparable to the English ones. In this work, we develop the Translate Align Retrieve (TAR) method to automatically translate the Stanford Question Answering Dataset (SQuAD) v1.1 to Spanish. We then used this dataset to train Spanish QA systems by fine-tuning a Multilingual-BERT model. Finally, we evaluated our QA models with the recently proposed MLQA and XQuAD benchmarks for cross-lingual Extractive QA. Experimental results show that our models outperform the previous Multilingual-BERT baselines achieving the new state-of-the-art value of 68.1 F1 points on the Spanish MLQA corpus and 77.6 F1 and 61.8 Exact Match points on the Spanish XQuAD corpus. The resulting, synthetically generated SQuAD-es v1.1 corpora, with almost 100 the best of our knowledge, is the first large-scale QA training resource for Spanish.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2022

Question Answering and Question Generation for Finnish

Recent advances in the field of language modeling have improved the stat...
research
06/02/2020

BERT Based Multilingual Machine Comprehension in English and Hindi

Multilingual Machine Comprehension (MMC) is a Question-Answering (QA) su...
research
04/15/2021

Are Multilingual BERT models robust? A Case Study on Adversarial Attacks for Multilingual Question Answering

Recent approaches have exploited weaknesses in monolingual question answ...
research
09/11/2019

Frustratingly Easy Natural Question Answering

Existing literature on Question Answering (QA) mostly focuses on algorit...
research
09/26/2016

Learning to Translate for Multilingual Question Answering

In multilingual question answering, either the question needs to be tran...
research
05/10/2021

Poolingformer: Long Document Modeling with Pooling Attention

In this paper, we introduce a two-level attention schema, Poolingformer,...

Please sign up or login with your details

Forgot password? Click here to reset