Can NLI Models Verify QA Systems' Predictions?

04/18/2021
by   Jifan Chen, et al.
0

To build robust question answering systems, we need the ability to verify whether answers to questions are truly correct, not just "good enough" in the context of imperfect QA datasets. We explore the use of natural language inference (NLI) as a way to achieve this goal, as NLI inherently requires the premise (document context) to contain all necessary information to support the hypothesis (proposed answer to the question). We leverage large pre-trained models and recent prior datasets to construct powerful question converter and decontextualization modules, which can reformulate QA instances as premise-hypothesis pairs with very high reliability. Then, by combining standard NLI datasets with NLI examples automatically derived from QA training data, we can train NLI models to judge the correctness of QA models' proposed answers. We show that our NLI approach can generally improve the confidence estimation of a QA model across different domains, evaluated in a selective QA setting. Careful manual analysis over the predictions of our NLI model shows that it can further identify cases where the QA model produces the right answer for the wrong reason, or where the answer cannot be verified as addressing all aspects of the question.

READ FULL TEXT
research
06/16/2020

Selective Question Answering under Domain Shift

To avoid giving wrong answers, question answering (QA) models need to kn...
research
01/24/2022

Unified Question Generation with Continual Lifelong Learning

Question Generation (QG), as a challenging Natural Language Processing t...
research
12/14/2022

DialogQAE: N-to-N Question Answer Pair Extraction from Customer Service Chatlog

Harvesting question-answer (QA) pairs from customer service chatlog in t...
research
12/01/2018

QADiver: Interactive Framework for Diagnosing QA Models

Question answering (QA) extracting answers from text to the given questi...
research
10/17/2019

Question Classification with Deep Contextualized Transformer

The latest work for Question and Answer problems is to use the Stanford ...
research
04/20/2018

Right Answer for the Wrong Reason: Discovery and Mitigation

Exposing the weaknesses of neural models is crucial for improving their ...
research
10/07/2020

Unsupervised Evaluation for Question Answering with Transformers

It is challenging to automatically evaluate the answer of a QA model at ...

Please sign up or login with your details

Forgot password? Click here to reset