Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

06/17/2020
by   Taira Tsuchiya, et al.
0

We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems, its properties for stochastic partial monitoring have not been theoretically investigated, and the existing algorithm relies on a heuristic approximation of the posterior distribution. To mitigate these problems, we present a novel Thompson-sampling-based algorithm, which enables us to exactly sample the target parameter from the posterior distribution. Besides, we prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret O(log T) for a linearized variant of the problem with local observability. This result is the first regret bound of Thompson sampling for partial monitoring, which also becomes the first logarithmic regret bound of Thompson sampling for linear bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2012

An Adaptive Algorithm for Finite Stochastic Partial Monitoring

We present a new anytime algorithm that achieves near-optimal regret for...
research
09/06/2018

Logarithmic regret in the dynamic and stochastic knapsack problem

We study a dynamic and stochastic knapsack problem in which a decision m...
research
02/07/2023

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

Partial monitoring is an expressive framework for sequential decision-ma...
research
02/25/2020

Information Directed Sampling for Linear Partial Monitoring

Partial monitoring is a rich framework for sequential decision making un...
research
03/02/2022

Partial Likelihood Thompson Sampling

We consider the problem of deciding how best to target and prioritize ex...
research
03/02/2022

An Analysis of Ensemble Sampling

Ensemble sampling serves as a practical approximation to Thompson sampli...
research
04/24/2020

Fast Thompson Sampling Algorithm with Cumulative Oversampling: Application to Budgeted Influence Maximization

We propose a cumulative oversampling (CO) technique for Thompson Samplin...

Please sign up or login with your details

Forgot password? Click here to reset