Pretraining De-Biased Language Model with Large-scale Click Logs for Document Ranking

02/27/2023
by   Xiangsheng Li, et al.
0

Pre-trained language models have achieved great success in various large-scale information retrieval tasks. However, most of pretraining tasks are based on counterfeit retrieval data where the query produced by the tailored rule is assumed as the user's issued query on the given document or passage. Therefore, we explore to use large-scale click logs to pretrain a language model instead of replying on the simulated queries. Specifically, we propose to use user behavior features to pretrain a debiased language model for document ranking. Extensive experiments on Baidu desensitization click logs validate the effectiveness of our method. Our team on WSDM Cup 2023 Pre-training for Web Search won the 1st place with a Discounted Cumulative Gain @ 10 (DCG@10) score of 12.16525 on the final leaderboard.

READ FULL TEXT

page 1

page 2

page 3

research
08/16/2023

Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval

In this paper, we systematically study the potential of pre-training wit...
research
05/24/2021

Pre-trained Language Model based Ranking in Baidu Search

As the heart of a search engine, the ranking system plays a crucial role...
research
10/20/2020

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Recently pre-trained language representation models such as BERT have sh...
research
05/22/2023

ConQueR: Contextualized Query Reduction using Search Logs

Query reformulation is a key mechanism to alleviate the linguistic chasm...
research
02/18/2023

Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search

An effective ranking model usually requires a large amount of training d...
research
09/11/2018

EXS: Explainable Search Using Local Model Agnostic Interpretability

Retrieval models in information retrieval are used to rank documents for...
research
07/07/2022

A Large Scale Search Dataset for Unbiased Learning to Rank

The unbiased learning to rank (ULTR) problem has been greatly advanced b...

Please sign up or login with your details

Forgot password? Click here to reset