Learning Dense Representations of Phrases at Scale

12/23/2020
by   Jinhyuk Lee, et al.
0

Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019). However, current phrase retrieval models heavily depend on their sparse representations while still underperforming retriever-reader approaches. In this work, we show for the first time that we can learn dense phrase representations alone that achieve much stronger performance in open-domain QA. Our approach includes (1) learning query-agnostic phrase representations via question generation and distillation; (2) novel negative-sampling methods for global normalization; (3) query-side fine-tuning for transfer learning. On five popular QA datasets, our model DensePhrases improves previous phrase retrieval models by 15 matches the performance of state-of-the-art retriever-reader models. Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs. Finally, we directly use our pre-indexed dense phrase representations for two slot filling tasks, showing the promise of utilizing DensePhrases as a dense knowledge base for downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2021

Phrase Retrieval Learns Passage Retrieval, Too

Dense retrieval methods have shown great promise over sparse retrieval m...
research
06/13/2019

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

Existing open-domain question answering (QA) models are not suitable for...
research
10/25/2022

Bridging the Training-Inference Gap for Dense Phrase Retrieval

Building dense retrievers requires a series of standard procedures, incl...
research
10/13/2021

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Despite their recent popularity and well known advantages, dense retriev...
research
11/07/2019

Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering

A sparse representation is known to be an effective means to encode prec...
research
01/06/2021

EfficientQA : a RoBERTa Based Phrase-Indexed Question-Answering System

State-of-the-art extractive question answering models achieve superhuman...
research
06/28/2023

Confidence-Calibrated Ensemble Dense Phrase Retrieval

In this paper, we consider the extent to which the transformer-based Den...

Please sign up or login with your details

Forgot password? Click here to reset