Embedding-based Zero-shot Retrieval through Query Generation

09/22/2020
by   Davis Liang, et al.
0

Passage retrieval addresses the problem of locating relevant passages, usually from a large corpus, given a query. In practice, lexical term-matching algorithms like BM25 are popular choices for retrieval owing to their efficiency. However, term-based matching algorithms often miss relevant passages that have no lexical overlap with the query and cannot be finetuned to downstream datasets. In this work, we consider the embedding-based two-tower architecture as our neural retrieval model. Since labeled data can be scarce and because neural retrieval models require vast amounts of data to train, we propose a novel method for generating synthetic training data for retrieval. Our system produces remarkable results, significantly outperforming BM25 on 5 out of 6 datasets tested, by an average of 2.45 points for Recall@1. In some cases, our model trained on synthetic data can even outperform the same model trained on real data

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2022

Out-of-Domain Semantics to the Rescue! Zero-Shot Hybrid Retrieval Models

The pre-trained language model (eg, BERT) based deep retrieval models ac...
research
07/23/2020

ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions

Most existing algorithms for cross-modal Information Retrieval are based...
research
04/02/2020

Deformation-Aware 3D Model Embedding and Retrieval

We introduce a new problem of retrieving 3D models that are deformable t...
research
04/29/2020

Complementing Lexical Retrieval with Semantic Residual Embedding

Information retrieval traditionally has relied on lexical matching signa...
research
08/09/2021

Zero in on Shape: A Generic 2D-3D Instance Similarity Metric learned from Synthetic Data

We present a network architecture which compares RGB images and untextur...
research
09/30/2022

Zero-Shot Retrieval with Search Agents and Hybrid Environments

Learning to search is the task of building artificial agents that learn ...
research
10/11/2022

Better Than Whitespace: Information Retrieval for Languages without Custom Tokenizers

Tokenization is a crucial step in information retrieval, especially for ...

Please sign up or login with your details

Forgot password? Click here to reset