Learning the Optimal Recommendation from Explorative Users

10/06/2021
by   Fan Yao, et al.
0

We propose a new problem setting to study the sequential interactions between a recommender system and a user. Instead of assuming the user is omniscient, static, and explicit, as the classical practice does, we sketch a more realistic user behavior model, under which the user: 1) rejects recommendations if they are clearly worse than others; 2) updates her utility estimation based on rewards from her accepted recommendations; 3) withholds realized rewards from the system. We formulate the interactions between the system and such an explorative user in a K-armed bandit framework and study the problem of learning the optimal recommendation on the system side. We show that efficient system learning is still possible but is more difficult. In particular, the system can identify the best arm with probability at least 1-δ within O(1/δ) interactions, and we prove this is tight. Our finding contrasts the result for the problem of best arm identification with fixed confidence, in which the best arm can be identified with probability 1-δ within O(log(1/δ)) interactions. This gap illustrates the inevitable cost the system has to pay when it learns from an explorative user's revealed preferences on its recommendations rather than from the realized rewards.

READ FULL TEXT
research
01/12/2022

Best Arm Identification with a Fixed Budget under a Small Gap

We consider the fixed-budget best arm identification problem in the mult...
research
06/19/2021

Variance-Dependent Best Arm Identification

We study the problem of identifying the best arm in a stochastic multi-a...
research
05/19/2021

Incentivized Bandit Learning with Self-Reinforcing User Preferences

In this paper, we investigate a new multi-armed bandit (MAB) online lear...
research
06/05/2023

Covariance Adaptive Best Arm Identification

We consider the problem of best arm identification in the multi-armed ba...
research
07/25/2020

Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

Users of music streaming, video streaming, news recommendation, and e-co...
research
04/10/2019

Causal Embeddings for Recommendation: An Extended Abstract

Recommendations are commonly used to modify user's natural behavior, for...
research
08/06/2023

A Lightweight Method for Modeling Confidence in Recommendations with Learned Beta Distributions

Most Recommender Systems (RecSys) do not provide an indication of confid...

Please sign up or login with your details

Forgot password? Click here to reset