Alloprof: a new French question-answer education dataset and its use in an information retrieval case study

Teachers and students are increasingly relying on online learning resources to supplement the ones provided in school. This increase in the breadth and depth of available resources is a great thing for students, but only provided they are able to find answers to their queries. Question-answering and information retrieval systems have benefited from public datasets to train and evaluate their algorithms, but most of these datasets have been in English text written by and for adults. We introduce a new public French question-answering dataset collected from Alloprof, a Quebec-based primary and high-school help website, containing 29 349 questions and their explanations in a variety of school subjects from 10 368 students, with more than half of the explanations containing links to other questions or some of the 2 596 reference pages on the website. We also present a case study of this dataset in an information retrieval task. This dataset was collected on the Alloprof public forum, with all questions verified for their appropriateness and the explanations verified both for their appropriateness and their relevance to the question. To predict relevant documents, architectures using pre-trained BERT models were fine-tuned and evaluated. This dataset will allow researchers to develop question-answering, information retrieval and other algorithms specifically for the French speaking education context. Furthermore, the range of language proficiency, images, mathematical symbols and spelling mistakes will necessitate algorithms based on a multimodal comprehension. The case study we present as a baseline shows an approach that relies on recent techniques provides an acceptable performance level, but more work is necessary before it can reliably be used and trusted in a production setting.

READ FULL TEXT

page 4

page 6

research
05/22/2019

ANTIQUE: A Non-Factoid Question Answering Benchmark

Considering the widespread use of mobile and voice search, answer passag...
research
05/10/2018

WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval

With the rise in mobile and voice search, answer passage retrieval acts ...
research
01/18/2021

Tip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification

While current information retrieval systems are effective for known-item...
research
05/11/2023

WebCPM: Interactive Web Search for Chinese Long-form Question Answering

Long-form question answering (LFQA) aims at answering complex, open-ende...
research
10/05/2018

POIReviewQA: A Semantically Enriched POI Retrieval and Question Answering Dataset

Many services that perform information retrieval for Points of Interest ...
research
02/16/2020

Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives: A Survey

Text-based Question Answering (QA) is a challenging task which aims at f...
research
08/11/2023

LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain of Teach Prompts

Teaching assistants have played essential roles in the long history of e...

Please sign up or login with your details

Forgot password? Click here to reset