Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering

10/04/2022
by   Tavish McDonald, et al.
0

Businesses generate thousands of documents that communicate their strategic vision and provide details of key products, services, entities, and processes. Knowledge workers then face the laborious task of reading these documents to identify, extract, and synthesize information relevant to their organizational goals. To automate information gathering, question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA systems requires access to labeled data (tuples of context, question, and answer). However, data curation for document QA is uniquely challenging because the context (i.e., answer evidence passage) needs to be retrieved from potentially long, ill-formatted documents. Existing QA datasets sidestep this challenge by providing short, well-defined contexts that are unrealistic in real-world applications. We present a three-stage document QA approach: (1) text extraction from PDF; (2) evidence retrieval from extracted texts to form well-posed contexts; (3) QA to extract knowledge from contexts to return high-quality answers - extractive, abstractive, or Boolean. Using QASPER as a surrogate to our proprietary data, our detect-retrieve-comprehend (DRC) system achieves a +6.25 improvement in Answer-F1 over existing baselines while delivering superior context selection. Our results demonstrate that DRC holds tremendous promise as a flexible framework for practical document QA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2020

Knowledge-Aided Open-Domain Question Answering

Open-domain question answering (QA) aims to find the answer to a questio...
research
03/22/2021

Mitigating False-Negative Contexts in Multi-document QuestionAnswering with Retrieval Marginalization

Question Answering (QA) tasks requiring information from multiple docume...
research
05/27/2022

V-Doc : Visual questions answers with Documents

We propose V-Doc, a question-answering tool using document images and PD...
research
12/19/2022

Visconde: Multi-document QA with GPT-3 and Neural Reranking

This paper proposes a question-answering system that can answer question...
research
01/13/2022

Grow-and-Clip: Informative-yet-Concise Evidence Distillation for Answer Explanation

Interpreting the predictions of existing Question Answering (QA) models ...
research
05/24/2023

A Controllable QA-based Framework for Decontextualization

Many real-world applications require surfacing extracted snippets to use...
research
10/09/2018

Answer Extraction in Question Answering using Structure Features and Dependency Principles

Question Answering (QA) research is a significant and challenging task i...

Please sign up or login with your details

Forgot password? Click here to reset