Cross-Lingual Training for Automatic Question Generation

06/06/2019
by   Vishwajeet Kumar, et al.
0

Automatic question generation (QG) is a challenging problem in natural language understanding. QG systems are typically built assuming access to a large number of training instances where each instance is a question and its corresponding answer. For a new language, such training instances are hard to obtain making the QG problem even more challenging. Using this as our motivation, we study the reuse of an available large QG dataset in a secondary language (e.g. English) to learn a QG model for a primary language (e.g. Hindi) of interest. For the primary language, we assume access to a large amount of monolingual text but only a small QG dataset. We propose a cross-lingual QG model which uses the following training regime: (i) Unsupervised pretraining of language models in both primary and secondary languages and (ii) joint supervised training for QG in both languages. We demonstrate the efficacy of our proposed approach using two different primary languages, Hindi and Chinese. We also create and release a new question answering dataset for Hindi consisting of 6555 sentences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2022

Unified Question Answering in Slovene

Question answering is one of the most challenging tasks in language unde...
research
10/16/2019

MLQA: Evaluating Cross-lingual Extractive Question Answering

Question answering (QA) models have shown rapid progress enabled by the ...
research
06/07/2022

cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation

Vision-and-language tasks are gaining popularity in the research communi...
research
05/23/2023

Evaluating and Modeling Attribution for Cross-Lingual Question Answering

Trustworthy answer content is abundant in many high-resource languages a...
research
08/27/2023

Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations

The language ability of Large Language Models (LLMs) is often unbalanced...
research
09/14/2023

Automatic Data Visualization Generation from Chinese Natural Language Questions

Data visualization has emerged as an effective tool for getting insights...
research
12/20/2022

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Prior work has shown that it is possible to expand pretrained Masked Lan...

Please sign up or login with your details

Forgot password? Click here to reset