Zero-shot Neural Retrieval via Domain-targeted Synthetic Query Generation

by   Ji Ma, et al.

Deep neural scoring models have recently been shown to improve ranking quality on a number of benchmarks (Guo et al., 2016; Daiet al., 2018; MacAvaney et al., 2019; Yanget al., 2019a). However, these methods rely on underlying ad-hoc retrieval systems to generate candidates for scoring, which are rarely neural themselves (Zamani et al., 2018). Re-cent work has shown that the performance of ad-hoc neural retrieval systems can be competitive with a number of baselines (Zamani et al.,2018), potentially leading the way to full end-to-end neural retrieval. A major road-block to the adoption of ad-hoc retrieval models is that they require large supervised training sets to surpass classic term-based techniques, which can be developed from raw corpora. Previous work shows weakly supervised data can yield competitive results, e.g., click data (Dehghaniet al., 2017; Borisov et al., 2016). Unfortunately for many domains, even weakly supervised data can be scarce. In this paper, we pro-pose an approach to zero-shot learning (Xianet al., 2018) for ad-hoc retrieval models that relies on synthetic query generation. Crucially, the query generation system is trained on general domain data, but is applied to documents in the targeted domain. This allows us to create arbitrarily large, yet noisy, query-document relevance pairs that are domain targeted. On a number of benchmarks, we show that this is an effective strategy for building neural retrieval models for specialised domains.


page 1

page 2

page 3

page 4


A Neural Passage Model for Ad-hoc Document Retrieval

Traditional statistical retrieval models often treat each document as a ...

Neural document expansion for ad-hoc information retrieval

Recently, Nogueira et al. [2019] proposed a new approach to document exp...

Learning More From Less: Towards Strengthening Weak Supervision for Ad-Hoc Retrieval

The limited availability of ground truth relevance labels has been a maj...

Few-Shot Generative Conversational Query Rewriting

Conversational query rewriting aims to reformulate a concise conversatio...

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

While billions of non-English speaking users rely on search engines ever...

Establishing Strong Baselines for TripClick Health Retrieval

We present strong Transformer-based re-ranking and dense retrieval basel...

Surface Form Competition: Why the Highest Probability Answer Isn't Always Right

Large language models have shown promising results in zero-shot settings...