Incentives in the Dark: Multi-armed Bandits for Evolving Users with Unknown Type

03/11/2018
by   Lillian J. Ratliff, et al.
0

Design of incentives or recommendations to users is becoming more common as platform providers continually emerge. We propose a multi-armed bandit approach to the problem in which users types are unknown a priori and evolve dynamically in time. Unlike the traditional bandit setting, observed rewards are generated by a single Markov process. We demonstrate via an illustrative example that blindly applying the traditional bandit algorithms results in very poor performance as measured by regret. We introduce two variants of classical bandit algorithms, upper confidence bound (UCB) and epsilon-greedy, for which we provide theoretical bounds on the regret. We conduct a number of simulation-based experiments to show how the algorithms perform in comparison to traditional UCB and epsilon-greedy algorithms as well as reinforcement learning (Q-learning).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2018

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

The design of personalized incentives or recommendations to improve user...
research
10/16/2021

Statistical Consequences of Dueling Bandits

Multi-Armed-Bandit frameworks have often been used by researchers to ass...
research
07/12/2019

Laplacian-regularized graph bandits: Algorithms and theoretical analysis

We study contextual multi-armed bandit problems in the case of multiple ...
research
07/30/2021

Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce

E-commerce sites strive to provide users the most timely relevant inform...
research
01/26/2020

Regime Switching Bandits

We study a multi-armed bandit problem where the rewards exhibit regime-s...
research
02/18/2023

Improving Fairness in Adaptive Social Exergames via Shapley Bandits

Algorithmic fairness is an essential requirement as AI becomes integrate...
research
05/18/2018

Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly

When confronted with massive data streams, summarizing data with dimensi...

Please sign up or login with your details

Forgot password? Click here to reset