Learning Term Discrimination

04/24/2020
by   Jibril Frej, et al.
0

Document indexing is a key component for efficient information retrieval (IR). After preprocessing steps such as stemming and stop-word removal, document indexes usually store term-frequencies (tf). Along with tf (that only reflects the importance of a term in a document), traditional IR models use term discrimination values (TDVs) such as inverse document frequency (idf) to favor discriminative terms during retrieval. In this work, we propose to learn TDVs for document indexing with shallow neural networks that approximate traditional IR ranking functions such as TF-IDF and BM25. Our proposal outperforms, both in terms of nDCG and recall, traditional approaches, even with few positively labelled query-document pairs as learning data. Our learned TDVs, when used to filter out terms of the vocabulary that have zero discrimination value, allow to both significantly lower the memory footprint of the inverted index and speed up the retrieval process (BM25 is up to 3 times faster), without degrading retrieval quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2018

DeepTileBars: Visualizing Term Distribution for Neural Information Retrieval

Most neural Information Retrieval (Neu-IR) models derive query-to-docume...
research
05/23/2019

An Efficient Approach for Super and Nested Term Indexing and Retrieval

This paper describes a new approach, called Terminological Bucket Indexi...
research
05/03/2023

Understanding Differential Search Index for Text Retrieval

The Differentiable Search Index (DSI) is a novel information retrieval (...
research
07/08/2019

Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks

Classical information retrieval (IR) methods, such as query likelihood a...
research
10/06/2015

Parameterized Neural Network Language Models for Information Retrieval

Information Retrieval (IR) models need to deal with two difficult issues...
research
04/12/2017

PACRR: A Position-Aware Neural IR Model for Relevance Matching

In order to adopt deep learning for information retrieval, models are ne...
research
06/06/2022

A Neural Corpus Indexer for Document Retrieval

Current state-of-the-art document retrieval solutions mainly follow an i...

Please sign up or login with your details

Forgot password? Click here to reset