A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

06/16/2021
by   Dimitris Pappas, et al.
0

Question answering (QA) systems for large document collections typically use pipelines that (i) retrieve possibly relevant documents, (ii) re-rank them, (iii) rank paragraphs or other snippets of the top-ranked documents, and (iv) select spans of the top-ranked snippets as exact answers. Pipelines are conceptually simple, but errors propagate from one component to the next, without later components being able to revise earlier decisions. We present an architecture for joint document and snippet ranking, the two middle stages, which leverages the intuition that relevant documents have good snippets and good snippets come from relevant documents. The architecture is general and can be used with any neural text relevance ranker. We experiment with two main instantiations of the architecture, based on POSIT-DRMM (PDRMM) and a BERT-based ranker. Experiments on biomedical data from BIOASQ show that our joint models vastly outperform the pipelines in snippet retrieval, the main goal for QA, with fewer trainable parameters, also remaining competitive in document retrieval. Furthermore, our joint PDRMM-based model is competitive with BERT-based models, despite using orders of magnitude fewer parameters. These claims are also supported by human evaluation on two test batches of BIOASQ. To test our key findings on another dataset, we modified the Natural Questions dataset so that it can also be used for document and snippet retrieval. Our joint PDRMM-based model again outperforms the corresponding pipeline in snippet retrieval on the modified Natural Questions dataset, even though it performs worse than the pipeline in document retrieval. We make our code and the modified Natural Questions dataset publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2020

Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

There are many existing retrieval and question answering datasets. Howev...
research
07/25/2023

Contributions to the Improvement of Question Answering Systems in the Biomedical Domain

This thesis work falls within the framework of question answering (QA) i...
research
02/12/2022

Recognition-free Question Answering on Handwritten Document Collections

In recent years, considerable progress has been made in the research are...
research
02/28/2020

DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding

Recent studies on open-domain question answering have achieved prominent...
research
04/17/2019

Document Expansion by Query Prediction

One technique to improve the retrieval effectiveness of a search engine ...
research
08/20/2018

Adaptive Document Retrieval for Deep Question Answering

State-of-the-art systems in deep question answering proceed as follows: ...
research
08/18/2023

How Discriminative Are Your Qrels? How To Study the Statistical Significance of Document Adjudication Methods

Creating test collections for offline retrieval evaluation requires huma...

Please sign up or login with your details

Forgot password? Click here to reset