Modelling Word Burstiness in Natural Language: A Generalised Polya Process for Document Language Models in Information Retrieval

08/20/2017
by   Ronan Cummins, et al.
0

We introduce a generalised multivariate Polya process for document language modelling. The framework outlined here generalises a number of statistical language models used in information retrieval for modelling document generation. In particular, we show that the choice of replacement matrix M ultimately defines the type of random process and therefore defines a particular type of document language model. We show that a particular variant of the general model is useful for modelling term-specific burstiness. Furthermore, via experimentation we show that this variant significantly improves retrieval effectiveness over a strong baseline on a number of small test collections.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2022

Patapasco: A Python Framework for Cross-Language Information Retrieval Experiments

While there are high-quality software frameworks for information retriev...
research
10/06/2015

Parameterized Neural Network Language Models for Information Retrieval

Information Retrieval (IR) models need to deal with two difficult issues...
research
02/26/2020

A hypergeometric test interpretation of a common tf-idf variant

Term frequency-inverse document frequency, or tf-idf for short, is a num...
research
06/25/2018

Evaluation of Information Retrieval Systems Using Structural Equation Modelling

The interpretation of the experimental data collected by testing systems...
research
04/24/2021

Learning Passage Impacts for Inverted Indexes

Neural information retrieval systems typically use a cascading pipeline,...
research
09/10/2020

Patient Cohort Retrieval using Transformer Language Models

We apply deep learning-based language models to the task of patient coho...
research
03/23/2021

Redefining Absent Keyphrases and their Effect on Retrieval Effectiveness

Neural keyphrase generation models have recently attracted much interest...

Please sign up or login with your details

Forgot password? Click here to reset