SlateFree: a Model-Free Decomposition for Reinforcement Learning with Slate Actions

09/05/2022
by   Anastasios Giovanidis, et al.
0

We consider the problem of sequential recommendations, where at each step an agent proposes some slate of N distinct items to a user from a much larger catalog of size K>>N. The user has unknown preferences towards the recommendations and the agent takes sequential actions that optimise (in our case minimise) some user-related cost, with the help of Reinforcement Learning. The possible item combinations for a slate is KN, an enormous number rendering value iteration methods intractable. We prove that the slate-MDP can actually be decomposed using just K item-related Q functions per state, which describe the problem in a more compact and efficient way. Based on this, we propose a novel model-free SARSA and Q-learning algorithm that performs N parallel iterations per step, without any prior user knowledge. We call this method , i.e. free-of-slates, and we show numerically that it converges very fast to the exact optimum for arbitrary user profiles, and that it outperforms alternatives from the literature.

READ FULL TEXT
research
09/05/2015

Reinforcement Learning with Parameterized Actions

We introduce a model-free algorithm for learning in Markov decision proc...
research
12/07/2010

Bridging the Gap between Reinforcement Learning and Knowledge Representation: A Logical Off- and On-Policy Framework

Knowledge Representation is important issue in reinforcement learning. I...
research
02/02/2019

When Collaborative Filtering Meets Reinforcement Learning

In this paper, we study a multi-step interactive recommendation problem,...
research
08/31/2018

Directed Exploration in PAC Model-Free Reinforcement Learning

We study an exploration method for model-free RL that generalizes the co...
research
02/19/2022

Who Are the Best Adopters? User Selection Model for Free Trial Item Promotion

With the increasingly fierce market competition, offering a free trial h...
research
11/03/2020

Secure Planning Against Stealthy Attacks via Model-Free Reinforcement Learning

We consider the problem of security-aware planning in an unknown stochas...
research
02/05/2022

Reinforcement learning for multi-item retrieval in the puzzle-based storage system

Nowadays, fast delivery services have created the need for high-density ...

Please sign up or login with your details

Forgot password? Click here to reset