Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval

10/23/2019
by   Zhuyun Dai, et al.
0

Term frequency is a common method for identifying the importance of a term in a query or document. But it is a weak signal, especially when the frequency distribution is flat, such as in long queries or short documents where the text is of sentence/passage-length. This paper proposes a Deep Contextualized Term Weighting framework that learns to map BERT's contextualized text representations to context-aware term weights for sentences and passages. When applied to passages, DeepCT-Index produces term weights that can be stored in an ordinary inverted index for passage retrieval. When applied to query text, DeepCT-Query generates a weighted bag-of-words query. Both types of term weight can be used directly by typical first-stage retrieval algorithms. This is novel because most deep neural network based ranking models have higher computational costs, and thus are restricted to later-stage rankers. Experiments on four datasets demonstrate that DeepCT's deep contextualized text understanding greatly improves the accuracy of first-stage retrieval algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2020

RepBERT: Contextualized Text Embeddings for First-Stage Retrieval

Although exact term match between queries and documents is the dominant ...
research
07/08/2019

Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks

Classical information retrieval (IR) methods, such as query likelihood a...
research
05/10/2020

Transformer-Based Language Models for Similar Text Retrieval and Ranking

Most approaches for similar text retrieval and ranking with long natural...
research
08/31/2023

Context Aware Query Rewriting for Text Rankers using LLM

Query rewriting refers to an established family of approaches that are a...
research
11/08/2018

An Axiomatic Study of Query Terms Order in Ad-hoc Retrieval

Classic retrieval methods use simple bag-of-word representations for que...
research
07/20/2020

Conformer-Kernel with Query Term Independence for Document Retrieval

The Transformer-Kernel (TK) model has demonstrated strong reranking perf...
research
10/17/2022

Effective and Efficient Query-aware Snippet Extraction for Web Search

Query-aware webpage snippet extraction is widely used in search engines ...

Please sign up or login with your details

Forgot password? Click here to reset