DAPR: A Benchmark on Document-Aware Passage Retrieval

05/23/2023
by   Kexin Wang, et al.
0

Recent neural retrieval mainly focuses on ranking short texts and is challenged with long documents. Existing work mainly evaluates either ranking passages or whole documents. However, there are many cases where the users want to find a relevant passage within a long document from a huge corpus, e.g. legal cases, research papers, etc. In this scenario, the passage often provides little document context and thus challenges the current approaches to finding the correct document and returning accurate results. To fill this gap, we propose and name this task Document-Aware Passage Retrieval (DAPR) and build a benchmark including multiple datasets from various domains, covering both DAPR and whole-document retrieval. In experiments, we extend the state-of-the-art neural passage retrievers with document-level context via different approaches including prepending document summary, pooling over passage representations, and hybrid retrieval with BM25. The hybrid-retrieval systems, the overall best, can only improve on the DAPR tasks marginally while significantly improving on the document-retrieval tasks. This motivates further research in developing better retrieval systems for the new task. The code and the data are available at https://github.com/kwang2049/dapr

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2020

Longformer for MS MARCO Document Re-ranking Task

Two step document ranking, where the initial retrieval is done by a clas...
research
07/19/2023

IncDSI: Incrementally Updatable Document Retrieval

Differentiable Search Index is a recently proposed paradigm for document...
research
06/28/2021

Keyphrase Generation for Scientific Document Retrieval

Sequence-to-sequence models have lead to significant progress in keyphra...
research
07/14/2017

Cross-genre Document Retrieval: Matching between Conversational and Formal Writings

This paper challenges a cross-genre document retrieval task, where the q...
research
11/09/2020

Adversarial Semantic Collisions

We study semantic collisions: texts that are semantically unrelated but ...
research
05/12/2021

Multi-Field Models in Neural Recipe Ranking – An Early Exploratory Study

Explicitly modelling field interactions and correlations in complex docu...
research
02/17/2019

Multiple Document Representations from News Alerts for Automated Bio-surveillance Event Detection

Due to globalization, geographic boundaries no longer serve as effective...

Please sign up or login with your details

Forgot password? Click here to reset