Log In Sign Up

Generation-Augmented Retrieval for Open-domain Question Answering

by   Yuning Mao, et al.

Conventional sparse retrieval methods such as TF-IDF and BM25 are simple and efficient, but solely rely on lexical overlap without semantic matching. Recent dense retrieval methods learn latent representations to tackle the lexical mismatch problem, while being more computationally expensive and insufficient for exact matching as they embed the text sequence into a single vector with limited capacity. In this paper, we present Generation-Augmented Retrieval (GAR), a query expansion method that augments a query with relevant contexts through text generation. We demonstrate on open-domain question answering that the generated contexts significantly enrich the semantics of the queries and thus GAR with sparse representations (BM25) achieves comparable or better performance than the state-of-the-art dense methods such as DPR <cit.>. We show that generating various contexts of a query is beneficial as fusing their results consistently yields better retrieval accuracy. Moreover, as sparse and dense representations are often complementary, GAR can be easily combined with DPR to achieve even better performance. Furthermore, GAR achieves the state-of-the-art performance on the Natural Questions and TriviaQA datasets under the extractive setting when equipped with an extractive reader, and consistently outperforms other retrieval methods when the same generative reader is used.


page 1

page 2

page 3

page 4


SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

We introduce SPARTA, a novel neural retrieval method that shows great pr...

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Despite their recent popularity and well known advantages, dense retriev...

Towards Universal Dense Retrieval for Open-domain Question Answering

In open-domain question answering, a model receives a text question as i...

An Unsupervised Model with Attention Autoencoders for Question Retrieval

Question retrieval is a crucial subtask for community question answering...

Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

We propose a simple and efficient multi-hop dense retrieval approach for...

Query Expansion Using Contextual Clue Sampling with Language Models

Query expansion is an effective approach for mitigating vocabulary misma...

FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

Retrieval-augmented generation models offer many benefits over standalon...