A Replication Study of Dense Passage Retriever

04/12/2021
by   Xueguang Ma, et al.
0

Text retrieval using learned dense representations has recently emerged as a promising alternative to "traditional" text retrieval using sparse bag-of-words representations. One recent work that has garnered much attention is the dense passage retriever (DPR) technique proposed by Karpukhin et al. (2020) for end-to-end open-domain question answering. We present a replication study of this work, starting with model checkpoints provided by the authors, but otherwise from an independent implementation in our group's Pyserini IR toolkit and PyGaggle neural text ranking library. Although our experimental results largely verify the claims of the original paper, we arrived at two important additional findings that contribute to a better understanding of DPR: First, it appears that the original authors under-report the effectiveness of the BM25 baseline and hence also dense–sparse hybrid retrieval results. Second, by incorporating evidence from the retriever and an improved answer span scoring technique, we are able to improve end-to-end question answering effectiveness using exactly the same models as in the original work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2020

Dense Passage Retrieval for Open-Domain Question Answering

Open-domain question answering relies on efficient passage retrieval to ...
research
12/09/2021

Densifying Sparse Representations for Passage Retrieval by Representational Slicing

Learned sparse and dense representations capture different successful ap...
research
09/23/2021

Towards Universal Dense Retrieval for Open-domain Question Answering

In open-domain question answering, a model receives a text question as i...
research
06/22/2021

Fine-tune the Entire RAG Architecture (including DPR retriever) for Question-Answering

In this paper, we illustrate how to fine-tune the entire Retrieval Augme...
research
10/28/2021

Dense Hierarchical Retrieval for Open-Domain Question Answering

Dense neural text retrieval has achieved promising results on open-domai...
research
02/19/2021

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

Pyserini is an easy-to-use Python toolkit that supports replicable IR re...
research
08/15/2022

Reproduction and Replication of an Adversarial Stylometry Experiment

Maintaining anonymity while communicating using natural language remains...

Please sign up or login with your details

Forgot password? Click here to reset