An Approach for Weakly-Supervised Deep Information Retrieval

07/01/2017
by   Sean MacAvaney, et al.
0

Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2018

NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval

Pseudo-relevance feedback (PRF) is commonly used to boost the performanc...
research
11/01/2020

CURE: Collection for Urdu Information Retrieval Evaluation and Ranking

Urdu is a widely spoken language with 163 million speakers worldwide acr...
research
07/06/2018

On the Equilibrium of Query Reformulation and Document Retrieval

In this paper, we study the interactions between query reformulation and...
research
10/16/2017

DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval

This paper concerns a deep learning approach to relevance ranking in inf...
research
03/14/2021

TripClick: The Log Files of a Large Health Web Search Engine

Click logs are valuable resources for a variety of information retrieval...
research
01/28/2020

Selective Weak Supervision for Neural Information Retrieval

This paper democratizes neural information retrieval to scenarios where ...
research
08/05/2023

Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

Domain transfer is a prevalent challenge in modern neural Information Re...

Please sign up or login with your details

Forgot password? Click here to reset