Variance Reduction in Gradient Exploration for Online Learning to Rank

06/10/2019
by   Huazheng Wang, et al.
0

Online Learning to Rank (OL2R) algorithms learn from implicit user feedback on the fly. The key of such algorithms is an unbiased estimation of gradients, which is often (trivially) achieved by uniformly sampling from the entire parameter space. This unfortunately introduces high-variance in gradient estimation, and leads to a worse regret of model estimation, especially when the dimension of parameter space is large. In this paper, we aim at reducing the variance of gradient estimation in OL2R algorithms. We project the selected updating direction into a space spanned by the feature vectors from examined documents under the current query (termed the "document space" for short), after interleaved test. Our key insight is that the result of interleaved test solely is governed by a user's relevance evaluation over the examined documents. Hence, the true gradient introduced by this test result should lie in the constructed document space, and components orthogonal to the document space in the proposed gradient can be safely removed for variance reduction. We prove that the projected gradient is an unbiased estimation of the true gradient, and show that this lower-variance gradient estimation results in significant regret reduction. Our proposed method is compatible with all existing OL2R algorithms which rank documents using a linear model. Extensive experimental comparisons with several state-of-the-art OL2R algorithms have confirmed the effectiveness of our proposed method in reducing the variance of gradient estimation and improving overall performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2018

Efficient Exploration of Gradient Space for Online Learning to Rank

Online learning to rank (OL2R) optimizes the utility of returned search ...
research
04/28/2020

Unbiased Learning to Rank: Online or Offline?

How to obtain an unbiased ranking model by learning to rank with biased ...
research
01/17/2022

Learning Neural Ranking Models Online from Implicit User Feedback

Existing online learning to rank (OL2R) solutions are limited to linear ...
research
08/12/2022

Gradient Estimation for Binary Latent Variables via Gradient Variance Clipping

Gradient estimation is often necessary for fitting generative models wit...
research
09/22/2018

Differentiable Unbiased Online Learning to Rank

Online Learning to Rank (OLTR) methods optimize rankers based on user in...
research
02/06/2019

On the Variance of Unbiased Online Recurrent Optimization

The recently proposed Unbiased Online Recurrent Optimization algorithm (...
research
05/18/2020

Unbiased Learning to Rank via Propensity Ratio Scoring

Implicit feedback, such as user clicks, is a major source of supervision...

Please sign up or login with your details

Forgot password? Click here to reset