Unsupervised Evaluation for Question Answering with Transformers

10/07/2020
by   Lukas Muttenthaler, et al.
20

It is challenging to automatically evaluate the answer of a QA model at inference time. Although many models provide confidence scores, and simple heuristics can go a long way towards indicating answer correctness, such measures are heavily dataset-dependent and are unlikely to generalize. In this work, we begin by investigating the hidden representations of questions, answers, and contexts in transformer-based QA architectures. We observe a consistent pattern in the answer representations, which we show can be used to automatically evaluate whether or not a predicted answer span is correct. Our method does not require any labeled data and outperforms strong heuristic baselines, across 2 datasets and 7 domains. We are able to predict whether or not a model's answer is correct with 91.37 accuracy on SubjQA. We expect that this method will have broad applications, e.g., in the semi-automatic development of QA datasets

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2023

SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References

Evaluation of QA systems is very challenging and expensive, with the mos...
research
06/02/2020

Subjective Question Answering: Deciphering the inner workings of Transformers in the realm of subjectivity

Understanding subjectivity demands reasoning skills beyond the realm of ...
research
04/18/2021

Can NLI Models Verify QA Systems' Predictions?

To build robust question answering systems, we need the ability to verif...
research
04/10/2020

Towards Automatic Generation of Questions from Long Answers

Automatic question generation (AQG) has broad applicability in domains s...
research
10/17/2019

Question Classification with Deep Contextualized Transformer

The latest work for Question and Answer problems is to use the Stanford ...
research
04/12/2022

ASQA: Factoid Questions Meet Long-Form Answers

An abundance of datasets and availability of reliable evaluation metrics...
research
08/23/2022

Unsupervised Question Answering via Answer Diversifying

Unsupervised question answering is an attractive task due to its indepen...

Please sign up or login with your details

Forgot password? Click here to reset