A Multiple Choices Reading Comprehension Corpus for Vietnamese Language Education

03/31/2023
by   Son T. Luu, et al.
0

Machine reading comprehension has been an interesting and challenging task in recent years, with the purpose of extracting useful information from texts. To attain the computer ability to understand the reading text and answer relevant information, we introduce ViMMRC 2.0 - an extension of the previous ViMMRC for the task of multiple-choice reading comprehension in Vietnamese Textbooks which contain the reading articles for students from Grade 1 to Grade 12. This dataset has 699 reading passages which are prose and poems, and 5,273 questions. The questions in the new dataset are not fixed with four options as in the previous version. Moreover, the difficulty of questions is increased, which challenges the models to find the correct choice. The computer must understand the whole context of the reading passage, the question, and the content of each choice to extract the right answers. Hence, we propose the multi-stage approach that combines the multi-step attention network (MAN) with the natural language inference (NLI) task to enhance the performance of the reading comprehension model. Then, we compare the proposed methodology with the baseline BERTology models on the new dataset and the ViMMRC 1.0. Our multi-stage models achieved 58.81 better than the highest BERTology models. From the results of the error analysis, we found the challenge of the reading comprehension models is understanding the implicit context in texts and linking them together in order to find the correct answers. Finally, we hope our new dataset will motivate further research in enhancing the language understanding ability of computers in the Vietnamese language.

READ FULL TEXT

page 11

page 14

page 15

research
01/16/2020

A Pilot Study on Multiple Choice Machine Reading Comprehension for Vietnamese Texts

Machine Reading Comprehension (MRC) is the task of natural language proc...
research
10/05/2020

Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning

Interactive Fiction (IF) games with real human-written natural language ...
research
09/15/2022

Machine Reading, Fast and Slow: When Do Models "Understand" Language?

Two of the most fundamental challenges in Natural Language Understanding...
research
10/04/2016

Embracing data abundance: BookTest Dataset for Reading Comprehension

There is a practically unlimited amount of natural language data availab...
research
09/27/2019

Multi-Modal Citizen Science: From Disambiguation to Transcription of Classical Literature

The engagement of citizens in the research projects, including Digital H...
research
07/23/2017

Adversarial Examples for Evaluating Reading Comprehension Systems

Standard accuracy metrics indicate that reading comprehension systems ar...
research
07/18/2023

Teach model to answer questions after comprehending the document

Multi-choice Machine Reading Comprehension (MRC) is a challenging extens...

Please sign up or login with your details

Forgot password? Click here to reset