Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

03/21/2022
by   Wei Zhong, et al.
0

With the recent success of dense retrieval methods based on bi-encoders, a number of studies have applied this approach to various interesting downstream retrieval tasks with good efficiency and in-domain effectiveness. Recently, we have also seen the presence of dense retrieval models in Math Information Retrieval (MIR) tasks, but the most effective systems remain "classic" retrieval methods that consider rich structure features. In this work, we try to combine the best of both worlds: a well-defined structure search method for effective formula search and bi-encoder dense retrieval models to capture contextual similarities in mathematical documents. Specifically, we have evaluated two representative bi-encoder models (ColBERT and DPR) for token-level and passage-level dense retrieval on recent MIR tasks. To our best knowledge, this is the first time a DPR model has been evaluated in the MIR domain. Our result shows that bi-encoder models are complementary to existing structure search methods, and we are able to advance the state of the art on a recent MIR dataset. We have made our model checkpoints and source code publicly available for the reproduction of our results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2022

Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering

Given its effectiveness on knowledge-intensive natural language processi...
research
03/31/2023

Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical dual encoders

In this paper, we consider the problem of improving the inference latenc...
research
05/27/2023

Continually Updating Generative Retrieval on Dynamic Corpora

Generative retrieval has recently been gaining a lot of attention from t...
research
07/28/2021

Domain-matched Pre-training Tasks for Dense Retrieval

Pre-training on larger datasets with ever increasing model size is now a...
research
08/11/2022

On the Value of Behavioral Representations for Dense Retrieval

We consider text retrieval within dense representational space in real-w...
research
05/10/2023

Evaluating Embedding APIs for Information Retrieval

The ever-increasing size of language models curtails their widespread ac...
research
08/13/2021

On Single and Multiple Representations in Dense Passage Retrieval

The advent of contextualised language models has brought gains in search...

Please sign up or login with your details

Forgot password? Click here to reset