Learning Dense Representations of Phrases at Scale

by   Jinhyuk Lee, et al.

Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019). However, current phrase retrieval models heavily depend on their sparse representations while still underperforming retriever-reader approaches. In this work, we show for the first time that we can learn dense phrase representations alone that achieve much stronger performance in open-domain QA. Our approach includes (1) learning query-agnostic phrase representations via question generation and distillation; (2) novel negative-sampling methods for global normalization; (3) query-side fine-tuning for transfer learning. On five popular QA datasets, our model DensePhrases improves previous phrase retrieval models by 15 matches the performance of state-of-the-art retriever-reader models. Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs. Finally, we directly use our pre-indexed dense phrase representations for two slot filling tasks, showing the promise of utilizing DensePhrases as a dense knowledge base for downstream tasks.


page 1

page 2

page 3

page 4


Phrase Retrieval Learns Passage Retrieval, Too

Dense retrieval methods have shown great promise over sparse retrieval m...

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

Existing open-domain question answering (QA) models are not suitable for...

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Despite their recent popularity and well known advantages, dense retriev...

Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering

A sparse representation is known to be an effective means to encode prec...

Refining Query Representations for Dense Retrieval at Test Time

Dense retrieval uses a contrastive learning framework to learn dense rep...

Simple and Efficient ways to Improve REALM

Dense retrieval has been shown to be effective for retrieving relevant d...

EfficientQA : a RoBERTa Based Phrase-Indexed Question-Answering System

State-of-the-art extractive question answering models achieve superhuman...