Precise Zero-Shot Dense Retrieval without Relevance Labels

12/20/2022
by   Luyu Gao, et al.
0

While dense retrieval has been shown effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance label is available. In this paper, we recognize the difficulty of zero-shot learning and encoding relevance. Instead, we propose to pivot through Hypothetical Document Embeddings (HyDE). Given a query, HyDE first zero-shot instructs an instruction-following language model (e.g. InstructGPT) to generate a hypothetical document. The document captures relevance patterns but is unreal and may contain false details. Then, an unsupervised contrastively learned encoder (e.g. Contriever) encodes the document into an embedding vector. This vector identifies a neighborhood in the corpus embedding space, where similar real documents are retrieved based on vector similarity. This second step ground the generated document to the actual corpus, with the encoder's dense bottleneck filtering out the incorrect details. Our experiments show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever and shows strong performance comparable to fine-tuned retrievers, across various tasks (e.g. web search, QA, fact verification) and languages (e.g. sw, ko, ja).

READ FULL TEXT
research
05/03/2023

Zero-Shot Listwise Document Reranking with a Large Language Model

Supervised ranking methods based on bi-encoder or cross-encoder architec...
research
09/21/2022

I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification

Despite the tremendous progress in zero-shot learning(ZSL), the majority...
research
03/11/2022

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

In this paper, we propose LaPraDoR, a pretrained dual-tower dense retrie...
research
03/09/2023

Can a Frozen Pretrained Language Model be used for Zero-shot Neural Retrieval on Entity-centric Questions?

Neural document retrievers, including dense passage retrieval (DPR), hav...
research
04/30/2022

To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

Current pre-trained language model approaches to information retrieval c...
research
10/05/2022

Contextualized Generative Retrieval

The text retrieval task is mainly performed in two ways: the bi-encoder ...
research
04/27/2023

Large Language Models are Strong Zero-Shot Retriever

In this work, we propose a simple method that applies a large language m...

Please sign up or login with your details

Forgot password? Click here to reset