On the Effect of Low-Frequency Terms on Neural-IR Models

04/29/2019
by   Hofstätter, et al.
0

Low-frequency terms are a recurring challenge for information retrieval models, especially neural IR frameworks struggle with adequately capturing infrequently observed words. While these terms are often removed from neural models - mainly as a concession to efficiency demands - they traditionally play an important role in the performance of IR models. In this paper, we analyze the effects of low-frequency terms on the performance and robustness of neural IR models. We conduct controlled experiments on three recent neural IR models, trained on a large-scale passage retrieval collection. We evaluate the neural IR models with various vocabulary sizes for their respective word embeddings, considering different levels of constraints on the available GPU memory. We observe that despite the significant benefits of using larger vocabularies, the performance gap between the vocabularies can be, to a great extent, mitigated by extensive tuning of a related parameter: the number of documents to re-rank. We further investigate the use of subword-token embedding models, and in particular FastText, for neural IR models. Our experiments show that using FastText brings slight improvements to the overall performance of the neural IR models in comparison to models trained on the full vocabulary, while the improvement becomes much more pronounced for queries containing low-frequency terms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2018

NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval

Pseudo-relevance feedback (PRF) is commonly used to boost the performanc...
research
05/12/2023

NevIR: Negation in Neural Information Retrieval

Negation is a common everyday phenomena and has been a consistent area o...
research
05/01/2020

Do Neural Ranking Models Intensify Gender Bias?

Concerns regarding the footprint of societal biases in information retri...
research
06/24/2016

Adaptability of Neural Networks on Varying Granularity IR Tasks

Recent work in Information Retrieval (IR) using Deep Learning models has...
research
03/14/2021

TripClick: The Log Files of a Large Health Web Search Engine

Click logs are valuable resources for a variety of information retrieval...
research
05/17/2022

Moving Stuff Around: A study on efficiency of moving documents into memory for Neural IR models

When training neural rankers using Large Language Models, it's expected ...
research
07/28/2020

Declarative Experimentation in Information Retrieval using PyTerrier

The advent of deep machine learning platforms such as Tensorflow and Pyt...

Please sign up or login with your details

Forgot password? Click here to reset