Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations

by   Yan Xiao, et al.
Institute of Computing Technology, Chinese Academy of Sciences

Vocabulary mismatch is a central problem in information retrieval (IR), i.e., the relevant documents may not contain the same (symbolic) terms of the query. Recently, neural representations have shown great success in capturing semantic relatedness, leading to new possibilities to alleviate the vocabulary mismatch problem in IR. However, most existing efforts in this direction have been devoted to the re-ranking stage. That is to leverage neural representations to help re-rank a set of candidate documents, which are typically obtained from an initial retrieval stage based on some symbolic index and search scheme (e.g., BM25 over the inverted index). This naturally raises a question: if the relevant documents have not been found in the initial retrieval stage due to vocabulary mismatch, there would be no chance to re-rank them to the top positions later. Therefore, in this paper, we study the problem how to employ neural representations to improve the recall of relevant documents in the initial retrieval stage. Specifically, to meet the efficiency requirement of the initial stage, we introduce a neural index for the neural representations of documents, and propose two hybrid search schemes based on both neural and symbolic indices, namely the parallel search scheme and the sequential search scheme. Our experiments show that both hybrid index and search schemes can improve the recall of the initial retrieval stage with small overhead.


page 1

page 2

page 3

page 4


Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

Search engines often follow a two-phase paradigm where in the first stag...

Learning Passage Impacts for Inverted Indexes

Neural information retrieval systems typically use a cascading pipeline,...

A Discriminative Semantic Ranker for Question Retrieval

Similar question retrieval is a core task in community-based question an...

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Multi-stage ranking pipelines have been a practical solution in modern s...

Adaptive Re-Ranking with a Corpus Graph

Search systems often employ a re-ranking pipeline, wherein documents (or...

Probably Reasonable Search in eDiscovery

In eDiscovery, a party to a lawsuit or similar action must search throug...

Getting Started with Neural Models for Semantic Matching in Web Search

The vocabulary mismatch problem is a long-standing problem in informatio...

Please sign up or login with your details

Forgot password? Click here to reset