Information Directed Sampling for Linear Partial Monitoring

02/25/2020
by   Johannes Kirschner, et al.
21

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2023

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

Partial monitoring is an expressive framework for sequential decision-ma...
research
06/17/2020

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

We investigate finite stochastic partial monitoring, which is a general ...
research
02/01/2019

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

We prove a new minimax theorem connecting the worst-case Bayesian regret...
research
02/22/2022

Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret

We show that a version of the generalised information ratio of Lattimore...
research
11/11/2020

Asymptotically Optimal Information-Directed Sampling

We introduce a computationally efficient algorithm for finite stochastic...
research
05/29/2021

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimens...
research
06/27/2012

An Adaptive Algorithm for Finite Stochastic Partial Monitoring

We present a new anytime algorithm that achieves near-optimal regret for...

Please sign up or login with your details

Forgot password? Click here to reset