Promptagator: Few-shot Dense Retrieval From 8 Examples

09/23/2022
by   Zhuyun Dai, et al.
0

Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search intents, queries, and search domains. In this paper, we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. To amplify the power of a few examples, we propose Prompt-base Query Generation for Retriever (Promptagator), which leverages large language models (LLM) as a few-shot query generator, and creates task-specific retrievers based on the generated data. Powered by LLM's generalization ability, Promptagator makes it possible to create task-specific end-to-end retrievers solely based on a few examples without using Natural Questions or MS MARCO to train Surprisingly, LLM prompting with no more than 8 examples allows dual encoders to outperform heavily engineered models trained on MS MARCO like ColBERT v2 by more than 1.2 nDCG on average on 11 retrieval sets. Further training standard-size re-rankers using the same generated data yields another 5.0 point nDCG improvement. Our studies determine that query generation can be far more effective than previously observed, especially when a small amount of task-specific knowledge is given.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval

Recent studies have shown that dense retrieval models, lacking dedicated...
research
12/15/2021

Large Dual Encoders Are Generalizable Retrievers

It has been shown that dual encoders trained on one domain often fail to...
research
07/17/2023

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

Dense retrieval (DR) converts queries and documents into dense embedding...
research
09/14/2023

Zero-shot Audio Topic Reranking using Large Language Models

The Multimodal Video Search by Examples (MVSE) project investigates usin...
research
03/07/2022

Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval

Passage retrieval is a fundamental task in information retrieval (IR) re...
research
10/30/2017

Open Set Logo Detection and Retrieval

Current logo retrieval research focuses on closed set scenarios. We argu...
research
06/29/2022

How Train-Test Leakage Affects Zero-shot Retrieval

Neural retrieval models are often trained on (subsets of) the millions o...

Please sign up or login with your details

Forgot password? Click here to reset