Pessimistic Off-Policy Optimization for Learning to Rank

06/06/2022
by   Matej Cief, et al.
0

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged much more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient and we analyze it. We study its Bayesian and frequentist variants, and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines, is robust, and is also general.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

Piecewise-Stationary Off-Policy Optimization

Off-policy learning is a framework for evaluating and optimizing policie...
research
03/11/2023

Uncertainty-Aware Off-Policy Learning

Off-policy learning, referring to the procedure of policy optimization w...
research
12/22/2022

Local Policy Improvement for Recommender Systems

Recommender systems aim to answer the following question: given the item...
research
08/03/2023

Fast Slate Policy Optimization: Going Beyond Plackett-Luce

An increasingly important building block of large scale machine learning...
research
12/08/2021

DiPS: Differentiable Policy for Sketching in Recommender Systems

In sequential recommender system applications, it is important to develo...
research
12/06/2018

Top-K Off-Policy Correction for a REINFORCE Recommender System

Industrial recommender systems deal with extremely large action spaces -...

Please sign up or login with your details

Forgot password? Click here to reset