Generalized Weak Supervision for Neural Information Retrieval

04/18/2023
by   Yen-Chieh Lien, et al.
0

Neural ranking models (NRMs) have demonstrated effective performance in several information retrieval (IR) tasks. However, training NRMs often requires large-scale training data, which is difficult and expensive to obtain. To address this issue, one can train NRMs via weak supervision, where a large dataset is automatically generated using an existing ranking model (called the weak labeler) for training NRMs. Weakly supervised NRMs can generalize from the observed data and significantly outperform the weak labeler. This paper generalizes this idea through an iterative re-labeling process, demonstrating that weakly supervised models can iteratively play the role of weak labeler and significantly improve ranking performance without using manually labeled data. The proposed Generalized Weak Supervision (GWS) solution is generic and orthogonal to the ranking model architecture. This paper offers four implementations of GWS: self-labeling, cross-labeling, joint cross- and self-labeling, and greedy multi-labeling. GWS also benefits from a query importance weighting mechanism based on query performance prediction methods to reduce noise in the generated training data. We further draw a theoretical connection between self-labeling and Expectation-Maximization. Our experiments on two passage retrieval benchmarks suggest that all implementations of GWS lead to substantial improvements compared to weak supervision in all cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2018

Towards Theoretical Understanding of Weak Supervision for Information Retrieval

Neural network approaches have recently shown to be effective in several...
research
04/28/2017

Neural Ranking Models with Weak Supervision

Despite the impressive improvements achieved by unsupervised deep neural...
research
04/14/2022

ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision

A way to overcome expensive and time-consuming manual data labeling is w...
research
01/28/2020

Selective Weak Supervision for Neural Information Retrieval

This paper democratizes neural information retrieval to scenarios where ...
research
05/30/2023

Understanding temporally weakly supervised training: A case study for keyword spotting

The currently most prominent algorithm to train keyword spotting (KWS) m...
research
10/22/2019

Weakly Supervised Disentanglement with Guarantees

Learning disentangled representations that correspond to factors of vari...
research
12/28/2020

Recommending Courses in MOOCs for Jobs: An Auto Weak Supervision Approach

The proliferation of massive open online courses (MOOCs) demands an effe...

Please sign up or login with your details

Forgot password? Click here to reset