Learning Dense Representations of Phrases at Scale

by   Jinhyuk Lee, et al.

Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019). However, current phrase retrieval models heavily depend on their sparse representations while still underperforming retriever-reader approaches. In this work, we show for the first time that we can learn dense phrase representations alone that achieve much stronger performance in open-domain QA. Our approach includes (1) learning query-agnostic phrase representations via question generation and distillation; (2) novel negative-sampling methods for global normalization; (3) query-side fine-tuning for transfer learning. On five popular QA datasets, our model DensePhrases improves previous phrase retrieval models by 15 matches the performance of state-of-the-art retriever-reader models. Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs. Finally, we directly use our pre-indexed dense phrase representations for two slot filling tasks, showing the promise of utilizing DensePhrases as a dense knowledge base for downstream tasks.


page 1

page 2

page 3

page 4


Phrase Retrieval Learns Passage Retrieval, Too

Dense retrieval methods have shown great promise over sparse retrieval m...

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

Existing open-domain question answering (QA) models are not suitable for...

Bridging the Training-Inference Gap for Dense Phrase Retrieval

Building dense retrievers requires a series of standard procedures, incl...

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Despite their recent popularity and well known advantages, dense retriev...

Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering

A sparse representation is known to be an effective means to encode prec...

EfficientQA : a RoBERTa Based Phrase-Indexed Question-Answering System

State-of-the-art extractive question answering models achieve superhuman...

Confidence-Calibrated Ensemble Dense Phrase Retrieval

In this paper, we consider the extent to which the transformer-based Den...

Please sign up or login with your details

Forgot password? Click here to reset