Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer

07/03/2020
by   Kateřina Macková, et al.
0

Reading comprehension is a well studied task, with huge training datasets in English. This work focuses on building reading comprehension systems for Czech, without requiring any manually annotated Czech training data. First of all, we automatically translated SQuAD 1.1 and SQuAD 2.0 datasets to Czech to create training and development data, which we release at http://hdl.handle.net/11234/1-3249. We then trained and evaluated several BERT and XLM-RoBERTa baseline models. However, our main focus lies in cross-lingual transfer models. We report that a XLM-RoBERTa model trained on English data and evaluated on Czech achieves very competitive performance, only approximately 2 percent points worse than a model trained on the translated Czech data. This result is extremely good, considering the fact that the model has not seen any Czech data during training. The cross-lingual transfer approach is very flexible and provides a reading comprehension in any language, for which we have enough monolingual raw texts.

READ FULL TEXT
research
08/18/2023

YORC: Yoruba Reading Comprehension dataset

In this paper, we create YORC: a new multi-choice Yoruba Reading Compreh...
research
09/01/2019

Cross-Lingual Machine Reading Comprehension

Though the community has made great progress on Machine Reading Comprehe...
research
10/11/2019

BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels

This paper presents BiPaR, a bilingual parallel novel-style machine read...
research
04/13/2020

Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension

Reading comprehension models often overfit to nuances of training datase...
research
05/31/2019

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

A large number of reading comprehension (RC) datasets has been created r...
research
04/29/2020

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Multilingual pre-trained models could leverage the training data from a ...
research
08/15/2019

XCMRC: Evaluating Cross-lingual Machine Reading Comprehension

We present XCMRC, the first public cross-lingual language understanding ...

Please sign up or login with your details

Forgot password? Click here to reset