Repartitioning of the ComplexWebQuestions Dataset

07/25/2018
by   Alon Talmor, et al.
0

Recently, Talmor and Berant (2018) introduced ComplexWebQuestions - a dataset focused on answering complex questions by decomposing them into a sequence of simpler questions and extracting the answer from retrieved web snippets. In their work the authors used a pre-trained reading comprehension (RC) model (Salant and Berant, 2018) to extract the answer from the web snippets. In this short note we show that training a RC model directly on the training data of ComplexWebQuestions reveals a leakage from the training set to the test set that allows to obtain unreasonably high performance. As a solution, we construct a new partitioning of ComplexWebQuestions that does not suffer from this leakage and publicly release it. We also perform an empirical evaluation on these two datasets and show that training a RC model on the training data substantially improves state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2018

The Web as a Knowledge-base for Answering Complex Questions

Answering complex questions is a time-consuming activity for humans that...
research
03/06/2023

Data Portraits: Recording Foundation Model Training Data

Foundation models are trained on increasingly immense and opaque dataset...
research
08/17/2018

Read + Verify: Machine Reading Comprehension with Unanswerable Questions

Machine reading comprehension with unanswerable questions aims to abstai...
research
07/19/2021

Bridging the Gap between Language Model and Reading Comprehension: Unsupervised MRC via Self-Supervision

Despite recent success in machine reading comprehension (MRC), learning ...
research
05/10/2021

ExpMRC: Explainability Evaluation for Machine Reading Comprehension

Achieving human-level performance on some of Machine Reading Comprehensi...
research
02/02/2020

Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension

Innovations in annotation methodology have been a propellant for Reading...
research
06/07/2023

Knowing-how Knowing-that: A New Task for Machine Reading Comprehension of User Manuals

The machine reading comprehension (MRC) of user manuals has huge potenti...

Please sign up or login with your details

Forgot password? Click here to reset