Recommender system as an exploration coordinator: a bounded O(1) regret algorithm for large platforms

01/29/2023
by   Hyunwook Kang, et al.
0

On typical modern platforms, users are only able to try a small fraction of the available items. This makes it difficult to model the exploration behavior of platform users as typical online learners who explore all the items. Towards addressing this issue, we propose to interpret a recommender system as a bandit exploration coordinator that provides counterfactual information updates. In particular, we introduce a novel algorithm called Counterfactual UCB (CFUCB) which is guarantees user exploration coordination with bounded regret under the presence of linear representations. Our results show that sharing information is a Subgame Perfect Nash Equilibrium for agents in terms of regret, leading to each agent achieving bounded regret. This approach has potential applications in personalized recommender systems and adaptive experimentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2020

Fictitious Play Outperforms Counterfactual Regret Minimization

We compare the performance of two popular iterative algorithms, fictitio...
research
06/29/2020

Small Nash Equilibrium Certificates in Very Large Games

In many game settings, the game is not explicitly given but is only acce...
research
09/01/2020

Exploration in two-stage recommender systems

Two-stage recommender systems are widely adopted in industry due to thei...
research
11/17/2017

Learning User Preferences to Incentivize Exploration in the Sharing Economy

We study platforms in the sharing economy and discuss the need for incen...
research
10/21/2021

To Recommend or Not? A Model-Based Comparison of Item-Matching Processes

Recommender systems are central to modern online platforms, but a popula...
research
11/23/2022

Incentive-Aware Recommender Systems in Two-Sided Markets

Online platforms in the Internet Economy commonly incorporate recommende...
research
08/15/2019

A Bayesian Choice Model for Eliminating Feedback Loops

Self-reinforcing feedback loops in personalization systems are typically...

Please sign up or login with your details

Forgot password? Click here to reset