Mitigating False-Negative Contexts in Multi-document QuestionAnswering with Retrieval Marginalization

03/22/2021
by   Ansong Ni, et al.
0

Question Answering (QA) tasks requiring information from multiple documents often rely on a retrieval model to identify relevant information from which the reasoning model can derive an answer. The retrieval model is typically trained to maximize the likelihood of the labeled supporting evidence. However, when retrieving from large text corpora such as Wikipedia, the correct answer can often be obtained from multiple evidence candidates, not all of them labeled as positive, thus rendering the training signal weak and noisy. The problem is exacerbated when the questions are unanswerable or the answers are boolean, since the models cannot rely on lexical overlap to map answers to supporting evidences. We develop a new parameterization of set-valued retrieval that properly handles unanswerable queries, and we show that marginalizing over this set during training allows a model to mitigate false negatives in annotated supporting evidences. We test our method with two multi-document QA datasets, IIRC and HotpotQA. On IIRC, we show that joint modeling with marginalization on alternative contexts improves model performance by 5.5 F1 points and achieves a new state-of-the-art performance of 50.6 F1. We also show that marginalization results in 0.9 to 1.6 QA F1 improvement on HotpotQA in various settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2022

Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering

Businesses generate thousands of documents that communicate their strate...
research
09/16/2020

DDRQA: Dynamic Document Reranking for Open-domain Multi-hop Question Answering

Open-domain multi-hop question answering (QA) requires to retrieve multi...
research
02/06/2020

Generating Scientific Question Answering Corpora from Q A forums

Question Answering (QA) is a natural language processing task that aims ...
research
06/21/2022

Questions Are All You Need to Train a Dense Passage Retriever

We introduce ART, a new corpus-level autoencoding approach for training ...
research
07/09/2021

Joint Models for Answer Verification in Question Answering Systems

This paper studies joint models for selecting correct answer sentences a...
research
05/10/2021

Poolingformer: Long Document Modeling with Pooling Attention

In this paper, we introduce a two-level attention schema, Poolingformer,...
research
04/17/2021

Joint Passage Ranking for Diverse Multi-Answer Retrieval

We study multi-answer retrieval, an under-explored problem that requires...

Please sign up or login with your details

Forgot password? Click here to reset