FQuAD2.0: French Question Answering and knowing that you know nothing

09/27/2021
by   Quentin Heinrich, et al.
0

Question Answering, including Reading Comprehension, is one of the NLP research areas that has seen significant scientific breakthroughs over the past few years, thanks to the concomitant advances in Language Modeling. Most of these breakthroughs, however, are centered on the English language. In 2020, as a first strong initiative to bridge the gap to the French language, Illuin Technology introduced FQuAD1.1, a French Native Reading Comprehension dataset composed of 60,000+ questions and answers samples extracted from Wikipedia articles. Nonetheless, Question Answering models trained on this dataset have a major drawback: they are not able to predict when a given question has no answer in the paragraph of interest, therefore making unreliable predictions in various industrial use-cases. In the present work, we introduce FQuAD2.0, which extends FQuAD with 17,000+ unanswerable questions, annotated adversarially, in order to be similar to answerable ones. This new dataset, comprising a total of almost 80,000 questions, makes it possible to train French Question Answering models with the ability of distinguishing unanswerable questions from answerable ones. We benchmark several models with this dataset: our best model, a fine-tuned CamemBERT-large, achieves a F1 score of 82.3 classification task, and a F1 score of 83

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2020

FQuAD: French Question Answering Dataset

Recent advances in the field of language modeling have improved state-of...
research
06/16/2016

SQuAD: 100,000+ Questions for Machine Comprehension of Text

We present the Stanford Question Answering Dataset (SQuAD), a new readin...
research
08/21/2018

CoQA: A Conversational Question Answering Challenge

Humans gather information by engaging in conversations involving a serie...
research
11/02/2021

UQuAD1.0: Development of an Urdu Question Answering Training Data for Machine Reading Comprehension

In recent years, low-resource Machine Reading Comprehension (MRC) has ma...
research
09/09/2019

Question Generation by Transformers

A machine learning model was developed to automatically generate questio...
research
07/01/2019

Katecheo: A Portable and Modular System for Multi-Topic Question Answering

We introduce a modular system that can be deployed on any Kubernetes clu...
research
05/20/2023

VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models

The VNHSGE (VietNamese High School Graduation Examination) dataset, deve...

Please sign up or login with your details

Forgot password? Click here to reset