Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search

02/18/2023
by   Xiaojie Sun, et al.
0

An effective ranking model usually requires a large amount of training data to learn the relevance between documents and queries. User clicks are often used as training data since they can indicate relevance and are cheap to collect, but they contain substantial bias and noise. There has been some work on mitigating various types of bias in simulated user clicks to train effective learning-to-rank models based on multiple features. However, how to effectively use such methods on large-scale pre-trained models with real-world click data is unknown. To alleviate the data bias in the real world, we incorporate heuristic-based features, refine the ranking objective, add random negatives, and calibrate the propensity calculation in the pre-training stage. Then we fine-tune several pre-trained models and train an ensemble model to aggregate all the predictions from various pre-trained models with human-annotation data in the fine-tuning stage. Our approaches won 3rd place in the "Pre-training for Web Search" task in WSDM Cup 2023 and are 22.6 team.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2023

Towards Better Web Search Performance: Pre-training, Fine-tuning and Learning to Rank

This paper describes the approach of the THUIR team at the WSDM Cup 2023...
research
02/22/2021

UPRec: User-Aware Pre-training for Recommender Systems

Existing sequential recommendation methods rely on large amounts of trai...
research
10/20/2021

Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs

Pre-trained model hubs with many pre-trained models (PTMs) have been a c...
research
07/07/2022

A Large Scale Search Dataset for Unbiased Learning to Rank

The unbiased learning to rank (ULTR) problem has been greatly advanced b...
research
03/06/2023

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

Recent years have witnessed the unprecedented achievements of large-scal...
research
06/06/2021

A Pre-training Oracle for Predicting Distances in Social Networks

In this paper, we propose a novel method to make distance predictions in...
research
02/27/2023

Pretraining De-Biased Language Model with Large-scale Click Logs for Document Ranking

Pre-trained language models have achieved great success in various large...

Please sign up or login with your details

Forgot password? Click here to reset