Learning Passage Impacts for Inverted Indexes

04/24/2021
by   Antonio Mallia, et al.
0

Neural information retrieval systems typically use a cascading pipeline, in which a first-stage model retrieves a candidate set of documents and one or more subsequent stages re-rank this set using contextualized language models such as BERT. In this paper, we propose DeepImpact, a new document term-weighting scheme suitable for efficient retrieval using a standard inverted index. Compared to existing methods, DeepImpact improves impact-score modeling and tackles the vocabulary-mismatch problem. In particular, DeepImpact leverages DocT5Query to enrich the document collection and, using a contextualized language model, directly estimates the semantic importance of tokens in a document, producing a single-value representation for each token in each document. Our experiments show that DeepImpact significantly outperforms prior first-stage retrieval approaches by up to 17 w.r.t. DocT5Query, and, when deployed in a re-ranking scenario, can reach the same effectiveness of state-of-the-art approaches with up to 5.1x speedup in efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2023

Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020...
research
06/28/2018

Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations

Vocabulary mismatch is a central problem in information retrieval (IR), ...
research
08/20/2017

Modelling Word Burstiness in Natural Language: A Generalised Polya Process for Document Language Models in Information Retrieval

We introduce a generalised multivariate Polya process for document langu...
research
04/29/2020

Expansion via Prediction of Importance with Contextualization

The identification of relevance with little textual context is a primary...
research
03/08/2021

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Multi-stage ranking pipelines have been a practical solution in modern s...
research
01/21/2021

Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline

Pre-trained deep language models (LM) have advanced the state-of-the-art...
research
10/03/2021

SDR: Efficient Neural Re-ranking using Succinct Document Representation

BERT based ranking models have achieved superior performance on various ...

Please sign up or login with your details

Forgot password? Click here to reset