ASQA: Factoid Questions Meet Long-Form Answers

by   Ivan Stelmakh, et al.
Carnegie Mellon University

An abundance of datasets and availability of reliable evaluation metrics have resulted in strong progress in factoid question answering (QA). This progress, however, does not easily transfer to the task of long-form QA, where the goal is to answer questions that require in-depth explanations. The hurdles include (i) a lack of high-quality data, and (ii) the absence of a well-defined notion of the answer's quality. In this work, we address these problems by (i) releasing a novel dataset and a task that we call ASQA (Answer Summaries for Questions which are Ambiguous); and (ii) proposing a reliable metric for measuring performance on ASQA. Our task focuses on factoid questions that are ambiguous, that is, have different correct answers depending on interpretation. Answers to ambiguous questions should synthesize factual information from multiple sources into a long-form summary that resolves the ambiguity. In contrast to existing long-form QA tasks (such as ELI5), ASQA admits a clear notion of correctness: a user faced with a good summary should be able to answer different interpretations of the original ambiguous question. We use this notion of correctness to define an automated metric of performance for ASQA. Our analysis demonstrates an agreement between this metric and human judgments, and reveals a considerable gap between human performance and strong baselines.


page 1

page 2

page 3

page 4


Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions

Many open-domain questions are under-specified and thus have multiple po...

SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References

Evaluation of QA systems is very challenging and expensive, with the mos...

Model Analysis Evaluation for Ambiguous Question Answering

Ambiguous questions are a challenge for Question Answering models, as th...

Unsupervised Evaluation for Question Answering with Transformers

It is challenging to automatically evaluate the answer of a QA model at ...

New Methods Metrics for LFQA tasks

Long-form question answering (LFQA) tasks require retrieving the documen...

Can AI Generate Love Advice?: Toward Neural Answer Generation for Non-Factoid Questions

Deep learning methods that extract answers for non-factoid questions fro...

Hurdles to Progress in Long-form Question Answering

The task of long-form question answering (LFQA) involves retrieving docu...

Please sign up or login with your details

Forgot password? Click here to reset