Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

04/11/2022
by   Alicia Parrish, et al.
0

Current QA systems can generate reasonable-sounding yet false answers without explanation or evidence for the generated answer, which is especially problematic when humans cannot readily check the model's answers. This presents a challenge for building trust in machine learning systems. We take inspiration from real-world situations where difficult questions are answered by considering opposing sides (see Irving et al., 2018). For multiple-choice QA examples, we build a dataset of single arguments for both a correct and incorrect answer option in a debate-style set-up as an initial step in training models to produce explanations for two candidate answers. We use long contexts – humans familiar with the context write convincing explanations for pre-selected correct and incorrect answers, and we test if those explanations allow humans who have not read the full context to more accurately determine the correct answer. We do not find that explanations in our set-up improve human accuracy, but a baseline condition shows that providing human-selected text snippets does improve accuracy. We use these findings to suggest ways of improving the debate set up for future data collection efforts.

READ FULL TEXT

page 8

page 9

page 10

research
10/19/2022

Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions

The use of language-model-based question-answering systems to aid humans...
research
10/13/2021

ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers

We describe a Question Answering (QA) dataset that contains complex ques...
research
08/17/2018

Read + Verify: Machine Reading Comprehension with Unanswerable Questions

Machine reading comprehension with unanswerable questions aims to abstai...
research
03/21/2022

How Do We Answer Complex Questions: Discourse Structure of Long-form Answers

Long-form answers, consisting of multiple sentences, can provide nuanced...
research
04/06/2022

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

Most research on question answering focuses on the pre-deployment stage;...
research
09/17/2023

ChatGPT Hallucinates when Attributing Answers

Can ChatGPT provide evidence to support its answers? Does the evidence i...
research
02/15/2021

Confidence-Aware Learning Assistant

Not only correctness but also self-confidence play an important role in ...

Please sign up or login with your details

Forgot password? Click here to reset