Incentivizing Exploration in Linear Bandits under Information Gap

04/08/2021
by   Huazheng Wang, et al.
0

We study the problem of incentivizing exploration for myopic users in linear bandits, where the users tend to exploit arm with the highest predicted reward instead of exploring. In order to maximize the long-term reward, the system offers compensation to incentivize the users to pull the exploratory arms, with the goal of balancing the trade-off among exploitation, exploration and compensation. We consider a new and practically motivated setting where the context features observed by the user are more informative than those used by the system, e.g., features based on users' private information are not accessible by the system. We propose a new method to incentivize exploration under such information gap, and prove that the method achieves both sublinear regret and sublinear compensation. We theoretical and empirically analyze the added compensation due to the information gap, compared with the case that the system has access to the same context features as the user, i.e., without information gap. We also provide a compensation lower bound of our problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2021

Combinatorial Bandits without Total Order for Arms

We consider the combinatorial bandits problem, where at each time step, ...
research
06/28/2023

Pure exploration in multi-armed bandits with low rank structure using oblivious sampler

In this paper, we consider the low rank structure of the reward sequence...
research
11/05/2018

Multi-armed Bandits with Compensation

We propose and study the known-compensation multi-arm bandit (KCMAB) pro...
research
01/17/2023

Optimal Algorithms for Latent Bandits with Cluster Structure

We consider the problem of latent bandits with cluster structure where t...
research
04/19/2018

Exploring Partially Observed Networks with Nonparametric Bandits

Real-world networks such as social and communication networks are too la...
research
05/25/2022

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

We propose a new learning framework that captures the tiered structure o...
research
07/04/2019

Reducing Exploration of Dying Arms in Mortal Bandits

Mortal bandits have proven to be extremely useful for providing news art...

Please sign up or login with your details

Forgot password? Click here to reset