Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations

06/28/2018
by   Yan Xiao, et al.
0

Vocabulary mismatch is a central problem in information retrieval (IR), i.e., the relevant documents may not contain the same (symbolic) terms of the query. Recently, neural representations have shown great success in capturing semantic relatedness, leading to new possibilities to alleviate the vocabulary mismatch problem in IR. However, most existing efforts in this direction have been devoted to the re-ranking stage. That is to leverage neural representations to help re-rank a set of candidate documents, which are typically obtained from an initial retrieval stage based on some symbolic index and search scheme (e.g., BM25 over the inverted index). This naturally raises a question: if the relevant documents have not been found in the initial retrieval stage due to vocabulary mismatch, there would be no chance to re-rank them to the top positions later. Therefore, in this paper, we study the problem how to employ neural representations to improve the recall of relevant documents in the initial retrieval stage. Specifically, to meet the efficiency requirement of the initial stage, we introduce a neural index for the neural representations of documents, and propose two hybrid search schemes based on both neural and symbolic indices, namely the parallel search scheme and the sequential search scheme. Our experiments show that both hybrid index and search schemes can improve the recall of the initial retrieval stage with small overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2020

Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

Search engines often follow a two-phase paradigm where in the first stag...
research
04/24/2021

Learning Passage Impacts for Inverted Indexes

Neural information retrieval systems typically use a cascading pipeline,...
research
07/18/2021

A Discriminative Semantic Ranker for Question Retrieval

Similar question retrieval is a core task in community-based question an...
research
03/08/2021

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Multi-stage ranking pipelines have been a practical solution in modern s...
research
08/18/2022

Adaptive Re-Ranking with a Corpus Graph

Search systems often employ a re-ranking pipeline, wherein documents (or...
research
01/28/2022

Probably Reasonable Search in eDiscovery

In eDiscovery, a party to a lawsuit or similar action must search throug...
research
11/08/2016

Getting Started with Neural Models for Semantic Matching in Web Search

The vocabulary mismatch problem is a long-standing problem in informatio...

Please sign up or login with your details

Forgot password? Click here to reset