PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

02/13/2021
by   Patrick Lewis, et al.
0

Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models lack the accuracy of retrieve-and-read systems, as substantially less knowledge is covered by the available QA-pairs relative to text corpora like Wikipedia. To facilitate improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very large resource of 65M automatically-generated QA-pairs. We introduce a new QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster. Using PAQ, we train CBQA models which outperform comparable baselines by 5 by over 15 configured for size (under 500MB) or speed (over 1K questions per second) whilst retaining high accuracy. Lastly, we demonstrate RePAQ's strength at selective QA, abstaining from answering when it is likely to be incorrect. This enables RePAQ to “back-off" to a more expensive state-of-the-art model, leading to a combined system which is both more accurate and 2x faster than the state-of-the-art model alone.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2021

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering

In open-domain question answering (QA), retrieve-and-read mechanism has ...
research
06/07/2023

When to Read Documents or QA History: On Unified and Selective Open-domain QA

This paper studies the problem of open-domain question answering, with t...
research
06/16/2020

Selective Question Answering under Domain Shift

To avoid giving wrong answers, question answering (QA) models need to kn...
research
11/23/2022

Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?

Recent state-of-the-art open-domain QA models are typically based on a t...
research
12/31/2019

What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge

Open-domain question answering (QA) is known to involve several underlyi...
research
01/05/2023

SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Existing multimodal conversation agents have shown impressive abilities ...
research
09/08/2021

R2-D2: A Modular Baseline for Open-Domain Question Answering

This work presents a novel four-stage open-domain QA pipeline R2-D2 (Ran...

Please sign up or login with your details

Forgot password? Click here to reset