DeepAI AI Chat
Log In Sign Up

Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation

by   Philippe Laban, et al.

Question generation (QGen) models are often evaluated with standardized NLG metrics that are based on n-gram overlap. In this paper, we measure whether these metric improvements translate to gains in a practical setting, focusing on the use case of helping teachers automate the generation of reading comprehension quizzes. In our study, teachers building a quiz receive question suggestions, which they can either accept or refuse with a reason. Even though we find that recent progress in QGen leads to a significant increase in question acceptance rates, there is still large room for improvement, with the best model having only 68.4 participated in our study. We then leverage the annotations we collected to analyze standard NLG metrics and find that model performance has reached projected upper-bounds, suggesting new automatic metrics are needed to guide QGen research forward.


page 1

page 2

page 3

page 4


MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

Posing reading comprehension as a generation problem provides a great de...

Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task

Current evaluation metrics to question answering based machine reading c...

Numerical reasoning in machine reading comprehension tasks: are we there yet?

Numerical reasoning based machine reading comprehension is a task that i...

Multiple-Choice Question Generation: Towards an Automated Assessment Framework

Automated question generation is an important approach to enable persona...

Question Answering as an Automatic Evaluation Metric for News Article Summarization

Recent work in the field of automatic summarization and headline generat...

How FAIR can you get? Image Retrieval as a Use Case to calculate FAIR Metrics

A large number of services for research data management strive to adhere...