Calibration of Machine Reading Systems at Scale

03/20/2022
by   Shehzaad Dhuliawala, et al.
0

In typical machine learning systems, an estimate of the probability of the prediction is used to assess the system's confidence in the prediction. This confidence measure is usually uncalibrated; i.e. the system's confidence in the prediction does not match the true probability of the predicted output. In this paper, we present an investigation into calibrating open setting machine reading systems such as open-domain question answering and claim verification systems. We show that calibrating such complex systems which contain discrete retrieval and deep reading components is challenging and current calibration techniques fail to scale to these settings. We propose simple extensions to existing calibration approaches that allows us to adapt them to these settings. Our experimental results reveal that the approach works well, and can be useful to selectively predict answers when question answering systems are posed with unanswerable or out-of-the-training distribution questions.

READ FULL TEXT
research
06/02/2021

Knowing More About Questions Can Help: Improving Calibration in Question Answering

We study calibration in question answering, estimating whether model cor...
research
09/29/2017

A Neural Comprehensive Ranker (NCR) for Open-Domain Question Answering

This paper proposes a novel neural machine reading model for open-domain...
research
01/01/2021

Reader-Guided Passage Reranking for Open-Domain Question Answering

Current open-domain question answering (QA) systems often follow a Retri...
research
09/25/2019

Question Answering is a Format; When is it Useful?

Recent years have seen a dramatic expansion of tasks and datasets posed ...
research
06/07/2023

When to Read Documents or QA History: On Unified and Selective Open-domain QA

This paper studies the problem of open-domain question answering, with t...
research
09/17/2020

On the Transferability of Minimal Prediction Preserving Inputs in Question Answering

Recent work (Feng et al., 2018) establishes the presence of short, unint...
research
08/21/2020

It's better to say "I can't answer" than answering incorrectly: Towards Safety critical NLP systems

In order to make AI systems more reliable and their adoption in safety c...

Please sign up or login with your details

Forgot password? Click here to reset