Fast Slate Policy Optimization: Going Beyond Plackett-Luce

08/03/2023
by   Otmane Sakhi, et al.
0

An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2023

A Recommender System Approach for Very Large-scale Multiobjective Optimization

We define very large multi-objective optimization problems to be multiob...
research
08/08/2022

Fast Offline Policy Optimization for Large Scale Recommendation

Personalised interactive systems such as recommender systems require sel...
research
06/06/2022

Pessimistic Off-Policy Optimization for Learning to Rank

Off-policy learning is a framework for optimizing policies without deplo...
research
12/06/2018

Top-K Off-Policy Correction for a REINFORCE Recommender System

Industrial recommender systems deal with extremely large action spaces -...
research
10/10/2018

Offline Multi-Action Policy Learning: Generalization and Optimization

In many settings, a decision-maker wishes to learn a rule, or policy, th...
research
08/15/2017

Towards Learning Reward Functions from User Interactions

In the physical world, people have dynamic preferences, e.g., the same s...
research
06/16/2022

Interaction-Grounded Learning with Action-inclusive Feedback

Consider the problem setting of Interaction-Grounded Learning (IGL), in ...

Please sign up or login with your details

Forgot password? Click here to reset