DeepAI AI Chat
Log In Sign Up

What's in a Name? Answer Equivalence For Open-Domain Question Answering

by   Chenglei Si, et al.
University of Maryland

A flaw in QA evaluation is that annotations often only provide one gold answer. Thus, model predictions semantically equivalent to the answer but superficially different are considered incorrect. This work explores mining alias entities from knowledge bases and using them as additional gold answers (i.e., equivalent answers). We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers. We analyse three QA benchmarks: Natural Questions, TriviaQA, and SQuAD. Answer expansion increases the exact match score on all datasets for evaluation, while incorporating it helps model training over real-world datasets. We ensure the additional answers are valid through a human post hoc evaluation.


page 1

page 2

page 3

page 4


Evaluating Open-Domain Question Answering in the Era of Large Language Models

Lexical matching remains the de facto evaluation method for open-domain ...

LIQUID: A Framework for List Question Answering Dataset Generation

Question answering (QA) models often rely on large-scale training datase...

Improving the Question Answering Quality using Answer Candidate Filtering based on Natural-Language Features

Software with natural-language user interfaces has an ever-increasing im...

A statistical model for aggregating judgments by incorporating peer predictions

We propose a probabilistic model to aggregate the answers of respondents...

UCD-CS at W-NUT 2020 Shared Task-3: A Text to Text Approach for COVID-19 Event Extraction on Social Media

In this paper, we describe our approach in the shared task: COVID-19 eve...

Grow-and-Clip: Informative-yet-Concise Evidence Distillation for Answer Explanation

Interpreting the predictions of existing Question Answering (QA) models ...

Semi-interactive Attention Network for Answer Understanding in Reverse-QA

Question answering (QA) is an important natural language processing (NLP...