YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker

09/14/2021
by   Ruizhi Pu, et al.
0

Pre-trained model such as BERT has been proved to be an effective tool for dealing with Information Retrieval (IR) problems. Due to its inspiring performance, it has been widely used to tackle with real-world IR problems such as document ranking. Recently, researchers have found that selecting "hard" rather than "random" negative samples would be beneficial for fine-tuning pre-trained models on ranking tasks. However, it remains elusive how to leverage hard negative samples in a principled way. To address the aforementioned issues, we propose a fine-tuning strategy for document ranking, namely Self-Involvement Ranker (SIR), to dynamically select hard negative samples to construct high-quality semantic space for training a high-quality ranking model. Specifically, SIR consists of sequential compressors implemented with pre-trained models. Front compressor selects hard negative samples for rear compressor. Moreover, SIR leverages supervisory signal to adaptively adjust semantic space of negative samples. Finally, supervisory signal in rear compressor is computed based on condition probability and thus can control sample dynamic and further enhance the model performance. SIR is a lightweight and general framework for pre-trained models, which simplifies the ranking process in industry practice. We test our proposed solution on MS MARCO with document ranking setting, and the results show that SIR can significantly improve the ranking performance of various pre-trained models. Moreover, our method became the new SOTA model anonymously on MS MARCO Document ranking leaderboard in May 2021.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2022

XDoc: Unified Pre-training for Cross-Format Document Understanding

The surge of pre-training has witnessed the rapid development of documen...
research
10/20/2021

Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs

Pre-trained model hubs with many pre-trained models (PTMs) have been a c...
research
08/02/2022

Debiasing Gender Bias in Information Retrieval Models

Biases in culture, gender, ethnicity, etc. have existed for decades and ...
research
07/07/2022

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space

This paper addresses an important problem of ranking the pre-trained dee...
research
03/21/2022

Boost Test-Time Performance with Closed-Loop Inference

Conventional deep models predict a test sample with a single forward pro...
research
08/31/2021

Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

Pre-Trained Models have been widely applied and recently proved vulnerab...
research
09/12/2022

Hard Negatives or False Negatives: Correcting Pooling Bias in Training Neural Ranking Models

Neural ranking models (NRMs) have become one of the most important techn...

Please sign up or login with your details

Forgot password? Click here to reset